Rasa-production_1 container runs out of memory after switching from ConveRT to BERT model

Hi, I’m running Rasa-x on a GCP server. I installed rasa-x following the instructions here

for docker-compose. The VM has the recommended hardware from the docs linked above. I’m using Rasa-x version 0.41.2, Rasa Open Source version 2.8.1 and Rasa SDK 2.8.1.

I’ve been deploying our trained model using Rasa-X and the ConveRTTokenizer and ConveRTFeaturizer in our pipeline. That has been working just fine. However, we recently switched to using the BERT model and thus we retrained our model using the LanguageModelTokenizer and LanguageModelFeaturizer in our pipeline.

Now when I go to deploy the containers with our newly trained model with

docker-compose up

the production container as well the worker container both crash within 40 seconds with an exit code of 137. This means they have reached their max RAM limit and have been killed. The rest of the containers are fine. I’ve been monitoring the RAM usage using docker stats and the memory limit for all the containers is 7.793GiB (the VM is configured to have 8GM of RAM). As the production and worker containers spin up, I can see their memory usage continue to climb until eventually they are killed after about 40 seconds. I have monitored the docker logs for the production container and I can see that the container is sending HTTP GET requests for 5 files from https://cdn.huggingface.co/ and after downloading the 5th file, the container is killed. I’ve tried the deployment multiple times to be sure and the production and worker containers are always killed after the same point in the logs every time.

I’ve made an assumption that the change in language model from ConveRT to BERT is the cause of the container failure as that is the only thing that has changed that I’m aware of. I should note, that I’ve also run the rasa server outside of a container using

rasa shell --debug

on the same VM and monitored the memory usage. This process does indeed spike to about 70% of the VM’s memory while it is downloading those files mentioned above and then the process settles back down and levels off at about 35% of the memory. But the VM doesn’t crash and I’m able to talk to the bot interactively this way. So my rationale is that the VM does have the capacity to handle this process at 8GM of RAM and I wonder if there’s something I need to adjust in the rasa-x docker-compose.yml file.

So my question is, does the VM itself need an increase in RAM or is there something else wrong that needs to be fixed?

I’m looking for any suggestions of what other things to try before increasing the VM’s memory (as I’m not convinced this is the correct or best solution). Has anyone had issues with deploying a model using BERT before? Or had Out-of-memory issues when deploying rasa-x containers?

I’m completely new to the Rasa stack and docker-compose in general so insights will be greatly appreciated.

2 Likes

Hey @aromatic-toast, sorry for such delay. Indeed, BERT is a much bigger model than ConveRT, and so it’s perfectly possible that you’re experiencing memory issues after switching to BERT. This being said, it’s interesting how everything works in the VM, but not in a Docker container that runs on the same VM. While I’m new to Docker as well (I’ll try to get advice from someone more knowledgeable), I wonder if it’s due to multiple Docker containers having to share those 8GBs of RAM that the VM provides. When you run just rasa shell --debug, that’s only Rasa Open Source running a lightweight server with the trained model. However, when you spin up all your containers, that brings Rasa X into play, which also requires some memory and isn’t as lightweight as the server for rasa shell.

1 Like

Thank you so much for your response @SamS . This is the conclusion we have come to as well as a team. So we bumped the RAM of our VM to 16GB and this seemed to do the trick for about a couple weeks. Until just now, I’m getting the same error again on the production container only this time.

Do you have a sense of how much RAM would be adequate? As mentioned above, the container seems to fail when it is in the middle of downloading files from Hugging Face. In your experience, is the BERT model and subsequent files from Hugging Face changing substantially enough that they will require constant upgrades in memory?

2 Likes

@aromatic-toast interesting :thinking: The HF models shouldn’t really change over time, certainly not in terms of these kinds of size differences. By the way, we’re still talking about loading a trained model – training itself works fine, correct? Originally, you said that the things was crashing after downloading the last one of the 5 files. To me, this sounds like it might actually be crashing during whatever stage follows the downloading. If you compare detailed logs from when everything works vs when things crash, what’s the next stage that’s not reached because of the crash?