Reduce RASA model memory consumption or load time

PranitModak · December 10, 2021, 7:12am

Hi Team,

When we deploy a rasa model (NLU+Core) it takes around 700MB of memory per model. Please help me to reduce model memory consumption. I am running a RASA model as rasa run with enable API.

I have over 60 models to be deployed which makes a lot of load on memory. Please help me to reduce the same and how can it be optimized.

PranitModak · December 10, 2021, 7:38am

Also let me know if there is any way to run nlu parse calls without running a nlu model service.

souvikg10 · December 10, 2021, 11:52am

Let’s start with your config, what are the different components you are using and why are they needed. It is important to understand the dimensions. Are you using those pre trained models?

How big is your training data? How many intents/stories you have.

You can use the pythonic way to running the service but it doesn’t come with any support. You can go through the code and implement it.it’s like literally import rasa and then you go on from there. Otherwise follow the documentation of starting a rasa server

PranitModak · December 13, 2021, 1:41pm

Data on an average per model is 6 intents with 3 or 4 intents each and 2 or 3 stories.

Below is the config for all my models.

language: en

pipeline:

name: WhitespaceTokenizer
name: RegexFeaturizer case_sensitive: true use_word_boundaries: false
name: CountVectorsFeaturizer stop_words:
- a
- and
- any
- are
- aren’t
- because
- being
- by
- can’t
- cannot
- could
- couldn’t
- does
- doesn’t
- don’t
- during
- from
- further
- if
- in
- into
- itself
- let’s
- more
- of
- or
- other
- ought
- over
- shan’t
- some
- such
- than
- that
- that’s
- them
- themselves
- this
- those
- through
- under
- until
- up
- very
- where
- where’s
- which
- while analyzer: word min_ngram: 1 max_ngram: 1
name: DIETClassifier epochs: 100
name: FallbackClassifier threshold: 0.5 ambiguity_threshold: 0.2
name: EntitySynonymMapper

policies:

name: TEDPolicy max_history: 3 epochs: 150 batch_size: 32 max_training_samples: 300
name: MemoizationPolicy
name: RulePolicy enable_fallback_prediction: ‘false’ restrict_rules: ‘false’ check_for_contradictions: ‘false’

Can you give me any documentation or anywhere where I can get a head start for the pythonic way of parsing nlp ?

souvikg10 · December 13, 2021, 2:54pm

I dont think there are any documentation on implementing the pythonic way with the latest rasa. you have to do it yourself but it is OSS you can simply check the code and walk through it on github. please keep in mind, i don’t think this is officially supported so fair warning.

Regarding your config, The biggest memory footprint is likely of tensorflow on your CPU… it doesn’t seem that your config is using pretrained models or anything… but i am surprised every model takes about 700Mb of space when running the API

PranitModak · December 13, 2021, 3:05pm

I have tried deploying it in alpine based docker container each model is around 700MB and when I deployed it through automated supervisord deployment it takes around 900MB but even supervisord only runs it through rasa run command with enable-api argument.

Can you tell me what’s the ideal memory requirement per model ?

souvikg10 · December 13, 2021, 3:35pm

Well i did some tests on my own and yeah my model shows about 500Mb of memory usage which also includes DIET

Tensorflow is a hard dependency of Rasa so i think it is safe to say, part of that memory footprint is tensorflow even when i use a non-tensorflow specific components such as spacy.

i dont see any specific hardware requirements for rasa oss, but there is a hardware requirement for Rasa X, 60-70% of which i believe is needed to run Rasa components which does the training and inference.

PranitModak · December 13, 2021, 3:48pm

ok thank you so much for the info it was helpful. Will try the pythonic way of nlp parsing. Wish me luck

PranitModak · December 14, 2021, 12:38pm

Also can you tell what’s the ideal time taken to load a model so as an alternate which I am thinking is loading the model on demand basis. If the model loading time is low enough I can go for that approach. What I have seen is around 30 seconds. Please do let me know your thoughts on it.

souvikg10 · December 14, 2021, 1:04pm

Yeah sounds about right. You can techincally use an LRU cache to cache your loaded model in the app in a least recently used rotation and thus that would reduce response times for subsequent calls

Aubrey · December 15, 2021, 6:27am

I am still facing this issue and your understanding is correct that I am using flask to interact with Rasa. I am caching the model generated by Interpreter.load(model_path) method by storing it in memory using a queue. I have added the code snippet which generates the model in the issue itself. Evenif I cache the model, I expected that memory consumption would increase by approximately 100-150MB as the model persisted in disk is around 50 MB. But in my case, its increasing by 1.5GB on an average with every training.

TargetPayandBenefits

Topic		Replies	Views
Large Memory & Time Utilization for loading a Model Getting Started with Rasa	1	167	July 6, 2020
Memory usage Rasa Open Source	4	2188	October 15, 2019
Rasa Model taking alot of time to train Rasa Open Source	9	2559	June 11, 2020
Rasa Is Consuming a lot of Memory Rasa Open Source	1	555	September 28, 2022
High memory usage when using RASA NLU as an HTTP Server Rasa Open Source	3	2013	September 1, 2018

Reduce RASA model memory consumption or load time

Related topics