How to reduce memory footprint in rasa open source?

faraz · February 28, 2023, 11:53am

Problem Statement: I want to split pipeline processes across different servers to reduce the load. Proposed Idea:

Even with the initially trained rasa model it eats up ~1.5gb ram, apparently this was happening because of how the model is implemented via tensorflow.

Does that mean there is no way to run multiple instances of rasa on the server without hogging up ~1.5gb ram per instance?
If I train rasa nlu only then can I run rasa core only(i couldn’t find a command yet to use core only) to reduce that ram usage? 3.Is there some other way where I can reduce the memory to say around 500mb per instance? Thanks

anoopshrma · February 28, 2023, 1:34pm

If you want to run multiple rasa models at once, You can try handling Rasa models via Python. For example create a Python server in a framework of your choice and load all the models. I think the only time memory size would increase is during the processing of the query.

An example of how you can train and deploy Rasa models is shown here Rasa NLU models in Python

**

Note that the blog is for handling NLU models but you will get an idea how you can load your complete model as well.

**

faraz · March 1, 2023, 11:55am

Thank you for the response I have looked at the given code and will try it out meanwhile I would like to know what is the standard hardware recommendation from rasa to serve multi client from a single server. (If recommended)

Use case: To serve multiple clients of cross domain on a single server.

How do players in the industry currently handle multiple clients?

anoopshrma · March 2, 2023, 7:22am

Currently Rasa does not provide feature to handle multiple models at once, And best way that I found to go over this problem statement is creating a python application as Rasa library is available only in python and is very easy to implement

Stepsvd · March 14, 2023, 8:22am

Hi Anoop, is there a difference in how Rasa Open source vs Pro/Enterprise consume the memory footprint.? Our development team has been training a conversational bot and we have upgraded to 1 tb memory on AWS infrastructure that still fails.? it just seem very excessive and not sure if it has to do with the open source vs “other” versions? it has fairly complex story blocks with checkpoints and intents but unsure.? thanks

anoopshrma · March 14, 2023, 8:48am

Hi @Stepsvd ,

For this, You’ll have to connect with Rasa Enterprise team as they will be able to tell you about this in a much better way.

Also is it just one model that’s taking 1TB memory ?

Stepsvd · March 14, 2023, 9:50am

Thanks Anoop yes just 1 model, is there someone on ENT team perhaps you can refer.?

siriusraja · March 14, 2023, 1:41pm

@Stepsvd Just curious to know the below data points.

what is the volume of training data - Number of intents , entities & training examples
what is the compressed size of model ?
Is it taking up 1 TB RAM when the service is started.

we have been serving multiple clients via different models as different services on the same EC2. But the size is way smaller than yours.

faraz · March 15, 2023, 4:54am

Hi @siriusraja, are you serving multiple clients through single EC2 server?

anoopshrma · March 15, 2023, 11:14am

Exactly @siriusraja, I am curious on these points as well. As none of the models that i have created got this big or started consuming this much memory.

I have one more question

How are you serving multiple models ?

Different Rasa servers ? but this case would create so many ports and endpoints and rasa services
Or handling multiple models via python: This is what i have done in my project handling multiple rasa models

siriusraja · March 15, 2023, 5:54pm

Hi @anoopshrma

we have starting rasa services on different ports for each client which uses the respective model. Ofcourse, this is approach would end up opening multiple ports. But that manageable.

If you want, you can put a nginx like proxy which can listen on single port and internally route to different client services.

siriusraja · March 15, 2023, 5:57pm

@faraz

Yes, we have multiple clients but the concurrent traffic is not high. Its spread across different hours of the day. So we were able to run multiple services servicing different clients.

Each client model takes upto 1 - 1.5 GB of RAM.

anoopshrma · March 15, 2023, 6:14pm

Hi @siriusraja

But will it be easy if you have to handle 100 clients (let say) at once. That would mean to manager 100 rasa services.

I’m just curios, will it not be good if you could handle models at one place ?

Just want your thoughts on this

siriusraja · March 16, 2023, 9:16am

@anoopshrma

Its a trade off between administrative overhead of managing multiple services & flexibility.

When i run as different services, I can start / stop the services for each client when the business hours start in their time zone. This would reduce the memory overhead in te server.

Do you see any benefits in terms of Memory / CPU savings in the single service approach ?

khangnd · November 15, 2024, 2:34am

Hello sir @siriusraja , we are also deploying our service like in your case. However, since every customer has been spam creating new bot with their account, we have been struggling to keep the resource consumption under control.

Is there a way to load the pre-trained language model as an internal server to parse features to other components, instead of loading it for each and every bot instance?

Topic		Replies	Views
Reduce RASA model memory consumption or load time Feedback on Rasa Open Source	10	2037	December 15, 2021
Reducing Memory Usage for Rasa Open Source Rasa Open Source	3	2027	December 29, 2021
RASA 3.0 large amount of memory how to reduce it? Welcome to the Rasa Community Forum!	2	831	September 15, 2023
How to reduce the RASA model memory consumption Rasa Open Source	0	213	January 12, 2023
Rasa Eats Memory. Is Garbage been handled? Rasa Open Source	9	1443	December 20, 2019

How to reduce memory footprint in rasa open source?

Related topics