When we deploy a rasa model (NLU+Core) it takes around 700MB of memory per model. Please help me to reduce model memory consumption. I am running a RASA model as rasa run with enable API.
I have over 60 models to be deployed which makes a lot of load on memory.
Please help me to reduce the same and how can it be optimized.
Let’s start with your config, what are the different components you are using and why are they needed. It is important to understand the dimensions. Are you using those pre trained models?
How big is your training data? How many intents/stories you have.
You can use the pythonic way to running the service but it doesn’t come with any support. You can go through the code and implement it.it’s like literally import rasa and then you go on from there. Otherwise follow the documentation of starting a rasa server
I dont think there are any documentation on implementing the pythonic way with the latest rasa. you have to do it yourself but it is OSS you can simply check the code and walk through it on github. please keep in mind, i don’t think this is officially supported so fair warning.
Regarding your config,
The biggest memory footprint is likely of tensorflow on your CPU… it doesn’t seem that your config is using pretrained models or anything… but i am surprised every model takes about 700Mb of space when running the API
I have tried deploying it in alpine based docker container each model is around 700MB and when I deployed it through automated supervisord deployment it takes around 900MB but even supervisord only runs it through rasa run command with enable-api argument.
Can you tell me what’s the ideal memory requirement per model ?
Well i did some tests on my own and yeah my model shows about 500Mb of memory usage which also includes DIET
Tensorflow is a hard dependency of Rasa so i think it is safe to say, part of that memory footprint is tensorflow even when i use a non-tensorflow specific components such as spacy.
i dont see any specific hardware requirements for rasa oss, but there is a hardware requirement for Rasa X, 60-70% of which i believe is needed to run Rasa components which does the training and inference.
Also can you tell what’s the ideal time taken to load a model so as an alternate which I am thinking is loading the model on demand basis. If the model loading time is low enough I can go for that approach. What I have seen is around 30 seconds. Please do let me know your thoughts on it.
Yeah sounds about right. You can techincally use an LRU cache to cache your loaded model in the app in a least recently used rotation and thus that would reduce response times for subsequent calls
I am still facing this issue and your understanding is correct that I am using flask to interact with Rasa. I am caching the model generated by Interpreter.load(model_path) method by storing it in memory using a queue. I have added the code snippet which generates the model in the issue itself. Evenif I cache the model, I expected that memory consumption would increase by approximately 100-150MB as the model persisted in disk is around 50 MB. But in my case, its increasing by 1.5GB on an average with every training.