Crashing rasa-x-event-service

Hello all,

I’m trying to deploy Rasa-X to our OpenShift cluster (v4.6).

I’m deploying using the Helm Template made available by Rasa. In the past we deployed version 1.6.3 of this Helm template with no problems.

Now I want to deploy the latest version (1.10.0 / appversion 0.39.1)

The first struggle was with the NGINX pod. I implemented a workaround for that: Nginx CrashLoopBackOff with 1.10.0 · Issue #191 · RasaHQ/rasa-x-helm · GitHub

Now i’m kept with a crashing event-service pod. I’m deploying to a new, empty namespace.

Ouput of the pod logs:

Unable to get database revision heads. DB revision(s) do not match migration scripts revision(s): DB revision: None Migration scripts revision: [‘6f9d9810a4e1’] Database revision does not match migrations’ latest, trying again in 4 seconds. Unable to get database revision heads. DB revision(s) do not match migration scripts revision(s): DB revision: None Migration scripts revision: [‘6f9d9810a4e1’] Database revision does not match migrations’ latest, trying again in 4 seconds. Unable to get database revision heads. DB revision(s) do not match migration scripts revision(s): DB revision: None Migration scripts revision: [‘6f9d9810a4e1’]

I’ve attached the provided Helm override file with this post. **overrides.txt (3.6 KB) **

Because this a ‘locked down’ OpenShift Cluster we import the images to an internal registry. So we point to the correct images and repo. We set some resource quota. And we set the passwords.

Hmm i’m noticing that the subcharts postgresql, rabbitmq & redis are not resulting in extra pods…

When i look at our old (1.6) deployment i’ve got: app, duckling, event-service, nginx, postgresql, rabbit, rasa-production, rasa-worker, rasa-x and redis

In this new deploy (1.10) ive only got: app, db-migration, duckling, event-service, nginx, rasa-production, rasa-worker & rasa-x…

OK, due secruitycontext issues (OpenShift) redis/rabbit and postgresql werent deployed. After disabling the securityContext I was able to depploy succefullly.

Now some new issues:

Rabbit pod:

Readiness probe failed: Error: this command requires the 'rabbit' app to be running on the target node. Start it with 'rabbitmqctl start_app'. Arguments given: node_health_check e[1mUsagee[0m rabbitmqctl [--node <node>] [--longnames] [--quiet] node_health_check [--timeout <timeout>] Error: this command requires the 'rabbit' app to be running on the target node. Start it with 'rabbitmqctl start_app'. Arguments given: status e[1mUsagee[0m rabbitmqctl [--node <node>] [--longnames] [--quiet] status [--unit <unit>] [--timeout <timeout>]

Event service pod:

Liveness probe failed: Get "http://10.131.7.243:5673/health": dial tcp 10.131.7.243:5673: connect: connection refused

Hi,

I am also working with OpenShift 4.6.

I have spent quite a lot of time working around issues in the chart having to do with OpenShift but have, I think, a mostly functioning system now.

I opened an issue related to my experience. I outlined all of my settings, etc. I had trouble disabling securityContext and ended up installing the upstream charts for PG, Redis and RabbitMQ.

As I just came across the same issue, I wanted to mention that the final error you noted with regard to the event service may be simply because the initialProbeDelay for the readiness probes is too short.

Let me know if you’ve got your system working and if you have any additional insights regarding running on OpenShift.