Even with the initially trained rasa model it eats up ~1.5gb ram, apparently this was happening because of how the model is implemented via tensorflow.
Does that mean there is no way to run multiple instances of rasa on the server without hogging up ~1.5gb ram per instance?
If I train rasa nlu only then can I run rasa core only(i couldn’t find a command yet to use core only) to reduce that ram usage?
3.Is there some other way where I can reduce the memory to say around 500mb per instance? Thanks
If you want to run multiple rasa models at once, You can try handling Rasa models via Python.
For example create a Python server in a framework of your choice and load all the models. I think the only time memory size would increase is during the processing of the query.
Thank you for the response I have looked at the given code and will try it out meanwhile I would like to know what is the standard hardware recommendation from rasa to serve multi client from a single server. (If recommended)
Use case: To serve multiple clients of cross domain on a single server.
How do players in the industry currently handle multiple clients?
Currently Rasa does not provide feature to handle multiple models at once, And best way that I found to go over this problem statement is creating a python application as Rasa library is available only in python and is very easy to implement
Hi Anoop, is there a difference in how Rasa Open source vs Pro/Enterprise consume the memory footprint.? Our development team has been training a conversational bot and we have upgraded to 1 tb memory on AWS infrastructure that still fails.? it just seem very excessive and not sure if it has to do with the open source vs “other” versions? it has fairly complex story blocks with checkpoints and intents but unsure.?
thanks
Exactly @siriusraja, I am curious on these points as well. As none of the models that i have created got this big or started consuming this much memory.
I have one more question
How are you serving multiple models ?
Different Rasa servers ? but this case would create so many ports and endpoints and rasa services
we have starting rasa services on different ports for each client which uses the respective model.
Ofcourse, this is approach would end up opening multiple ports. But that manageable.
If you want, you can put a nginx like proxy which can listen on single port and internally route to different client services.
Yes, we have multiple clients but the concurrent traffic is not high. Its spread across different hours of the day. So we were able to run multiple services servicing different clients.
Its a trade off between administrative overhead of managing multiple services & flexibility.
When i run as different services, I can start / stop the services for each client when the business hours start in their time zone. This would reduce the memory overhead in te server.
Do you see any benefits in terms of Memory / CPU savings in the single service approach ?
Hello sir @siriusraja , we are also deploying our service like in your case. However, since every customer has been spam creating new bot with their account, we have been struggling to keep the resource consumption under control.
Is there a way to load the pre-trained language model as an internal server to parse features to other components, instead of loading it for each and every bot instance?