Problems installing RasaX chart on OpenShift 4.6

michaelsteigman · November 15, 2021, 5:56pm

Hello,

I have appreciated the OpenShift references in the docs and was excited to get started with RasaX. I have been using OpenShift as a dev for around 5 years and more recently, have jumped into Kustomize quite heavily but this is my first experience with Helm. I have been looking forward to trying it, though.

I am using these instructions as my guide.

I filled in the basic values as instructed by the docs and tried the install and received the errors the docs mention regarding runAsUser. I added the securityContext value and the made it further. However, Postgresql and Nginx are both unable to start up.

The PG logs:

[38;5;6mpostgresql e[38;5;5m17:14:39.98 e[0me[38;5;2mINFO e[0m ==> Initializing PostgreSQL database...
chmod: changing permissions of '/bitnami/postgresql/data': Operation not permitted
e[38;5;6mpostgresql e[38;5;5m17:14:40.01 e[0me[38;5;3mWARN e[0m ==> Lack of permissions on data directory!
chmod: changing permissions of '/bitnami/postgresql/data': Operation not permitted
e[38;5;6mpostgresql e[38;5;5m17:14:40.01 e[0me[38;5;3mWARN e[0m ==> Lack of permissions on data directory!

The Nginx logs:

/docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration
/docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/
/docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh
10-listen-on-ipv6-by-default.sh: info: can not modify /etc/nginx/conf.d/default.conf (read-only file system?)
/docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh
20-envsubst-on-templates.sh: ERROR: /etc/nginx/templates exists, but /etc/nginx/conf.d is not writable
/docker-entrypoint.sh: Launching /docker-entrypoint.d/30-tune-worker-processes.sh
/docker-entrypoint.sh: Configuration complete; ready for start up
2021/11/15 17:14:56 [emerg] 1#1: mkdir() "/etc/nginx/client_body" failed (13: Permission denied)
nginx: [emerg] mkdir() "/etc/nginx/client_body" failed (13: Permission denied)

On the PG side, after doing a lot of reading, I’m using the following in values.yaml:

postgresql:
  volumePermissions:
    securityContext:
      runAsUser: "auto"
  securityContext:
    enabled: false
  containerSecurityContext:
    enabled: false
  shmVolume:
    chmod:
      enabled: false

Per the PG Helm chart notes for OpenShift. I can install the Helm PG chart no problem, by the way, using just:

securityContext:
  enabled: false
containerSecurityContext:
  enabled: false

Can anyone point me in the right direction?

Thanks!

bcartign · November 17, 2021, 9:33am

Hello, For NGINX, did you try to use this docker image : nginxinc/nginx-unprivileged ?

https://github.com/nginxinc/docker-nginx-unprivileged

michaelsteigman · November 18, 2021, 5:49pm

Thanks, @bcartign. I hadn’t seen any reference to that in the docs. I added name and tag referencing this image to my values file and it seems to boot up Ok

Next, I installed the v10 Bitnami PG chart (installs without issue, unlike the v8 Bitnami PG subchart in the RasaX chart) and added the existingHost and related settings to my values files. It appears that the RasaX components can access and write to the DB. After looking at the logs I realized I needed to create a DB named rasa so I did so. After installing the RasaX chart, I can see the DB is populated with 123 relations.

The app, db-migration, duckling and nginx pods all appear to be running Ok. So that much seems good.

However, the RasaX worker, production and event-service init-db containers output the following message:

ERROR: Unable to lock database: Permission denied
ERROR: Failed to open apk database: Permission denied
The database migration status: completed...100%

This is what I see in the rasa-x pod’s logs:

[2021-11-18 16:57:35 +0000] [27] [ERROR] Exception occurred while handling uri: 'http://10.206.5.185:5002/api/health'
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/sanic/app.py", line 973, in handle_request
    response = await response
  File "/usr/local/lib/python3.8/dist-packages/rasax/community/api/blueprints/project.py", line 90, in health
    ] = await db_migration_service.migration_status()
  File "/usr/local/lib/python3.8/dist-packages/rasax/community/services/db_migration_service.py", line 77, in migration_status
    progress = sql_migrations.get_migration_progress(session)
  File "/usr/local/lib/python3.8/dist-packages/rasax/community/sql_migrations.py", line 159, in get_migration_progress
    current_position = revisions.index(db_heads[0]) + 1
ValueError: 'd0e8d1bde9fb' is not in list
[2021-11-18 16:57:35 +0000] - (sanic.access)[INFO][10.206.4.1:38098]: GET http://10.206.5.185:5002/api/health  500 214
INFO:sanic.access:
DEBUG:rasax.community.database.utils:Current database revision ['d0e8d1bde9fb'] does not match last migration revision ['752ee6ed5d9f'], indicating that database migration is not yet complete.
DEBUG:rasax.community.database.utils:Trying again in 4 seconds.

as well as

redis.exceptions.TimeoutError: Timeout connecting to server

(FWIW, I do not see anything relating to RabbitMQ or Redis in my cluster - no services, deployments. I did add the password for both redis and rabbitmq per instructions.)

The events for the rasa-x pod look like:

Events:
  Type     Reason            Age                    From                                                        Message
  ----     ------            ----                   ----                                                        -------
  Warning  FailedScheduling  <unknown>                                                                          0/9 nodes are available: 9 pod has unbound immediate PersistentVolumeClaims.
  Warning  FailedScheduling  <unknown>                                                                          0/9 nodes are available: 9 pod has unbound immediate PersistentVolumeClaims.
  Normal   Scheduled         <unknown>                                                                          Successfully assigned my-proj/rasax-rasa-x-b749c5cd6-sl6p7 to <node>.                             Normal   AddedInterface    19m                    multus                                                      Add eth0 [10.206.5.185/23]
  Normal   Created           18m                    kubelet, <node>                                             Created container rasa-x
  Normal   Started           18m                    kubelet, <node>                                             Started container rasa-x
  Normal   Pulled            16m (x2 over 18m)      kubelet, <node>                                             Container image "rasa/rasa-x:latest" already present on machine
  Normal   Killing           16m                    kubelet, <node>                                             Container rasa-x failed liveness probe, will be restarted
  Warning  Unhealthy         16m                    kubelet, <node>                                             Readiness probe failed: Get "http://10.206.5.185:5002/": dial tcp 10.206.5.185:5002: connect: connection refused
  Warning  Unhealthy         13m (x21 over 17m)     kubelet, <node>                                             Readiness probe failed: HTTP probe failed with statuscode: 500
  Warning  Unhealthy         8m49s (x51 over 17m)   kubelet, <node>                                             Liveness probe failed: HTTP probe failed with statuscode: 500
  Warning  BackOff           3m52s (x9 over 5m28s)  kubelet, <node>                                             Back-off restarting failed container

the rasa-production, event-service, and rasa-worker event logs look similar - all have failing liveness probes and all eventually fall into a CrashLoopBackOff state.

Any suggestions on where to start with all of this?

Thanks.

michaelsteigman · December 9, 2021, 7:46pm

Just wanted to circle back and say that I believe I have worked through the issues about, although it was difficult.

In case it’s helpful for anyone who comes across this thread, I have opened an issue on Github outlining my steps and settings.

Topic		Replies	Views
Crashing rasa-x-event-service [Deprecated] Rasa X Community Edition	3	806	December 9, 2021
Rasax and database cannot restart with DB-error [Deprecated] Rasa X Community Edition	1	830	December 11, 2019
Can't find manage_users.py on Openshift deployment [Deprecated] Rasa X Community Edition	3	709	July 17, 2019
Kubernetes fails to install from Helm Chart [Deprecated] Rasa X Community Edition	2	789	April 9, 2020
How can I troubleshoot the cluster here to access the rasa-x application and check the action server is working properly Rasa Open Source	4	484	June 25, 2022

Problems installing RasaX chart on OpenShift 4.6

https://github.com/nginxinc/docker-nginx-unprivileged

Related topics