Hi, I’m running Rasa-x on a GCP server. I installed rasa-x following the instructions here
for docker-compose. The VM has the recommended hardware from the docs linked above. I’m using Rasa-x version 0.41.2, Rasa Open Source version 2.8.1 and Rasa SDK 2.8.1.
I’ve been deploying our trained model using Rasa-X and the ConveRTTokenizer and ConveRTFeaturizer in our pipeline. That has been working just fine. However, we recently switched to using the BERT model and thus we retrained our model using the LanguageModelTokenizer and LanguageModelFeaturizer in our pipeline.
Now when I go to deploy the containers with our newly trained model with
docker-compose up
the production container as well the worker container both crash within 40 seconds with an exit code of 137. This means they have reached their max RAM limit and have been killed. The rest of the containers are fine. I’ve been monitoring the RAM usage using
docker stats
and the memory limit for all the containers is 7.793GiB (the VM is configured to have 8GM of RAM). As the production and worker containers spin up, I can see their memory usage continue to climb until eventually they are killed after about 40 seconds. I have monitored the docker logs for the production container and I can see that the container is sending HTTP GET requests for 5 files from https://cdn.huggingface.co/ and after downloading the 5th file, the container is killed. I’ve tried the deployment multiple times to be sure and the production and worker containers are always killed after the same point in the logs every time.
I’ve made an assumption that the change in language model from ConveRT to BERT is the cause of the container failure as that is the only thing that has changed that I’m aware of. I should note, that I’ve also run the rasa server outside of a container using
rasa shell --debug
on the same VM and monitored the memory usage. This process does indeed spike to about 70% of the VM’s memory while it is downloading those files mentioned above and then the process settles back down and levels off at about 35% of the memory. But the VM doesn’t crash and I’m able to talk to the bot interactively this way. So my rationale is that the VM does have the capacity to handle this process at 8GM of RAM and I wonder if there’s something I need to adjust in the rasa-x docker-compose.yml file.
So my question is, does the VM itself need an increase in RAM or is there something else wrong that needs to be fixed?
I’m looking for any suggestions of what other things to try before increasing the VM’s memory (as I’m not convinced this is the correct or best solution). Has anyone had issues with deploying a model using BERT before? Or had Out-of-memory issues when deploying rasa-x containers?
I’m completely new to the Rasa stack and docker-compose in general so insights will be greatly appreciated.