Deploying multiple Rasa NLU models

How can we deploy multiple Rasa NLU models (only) in a server? Please suggest the right way of doing it?

Can we create multiple docker containers in a server and deploy multiple Rasa nlu models at different ports?

@abhishek-eltropy Can you elaborate more with some use case example?

Let’s say we want to deploy 10 different types of chatbots. So, we train the NLU models for them separately and just want to serve the NLU models at different ports/container. How should we go about this?

@abhishek-eltropy and all these 10 chatbots will be deploy on same server? If you want to use the same NLU model for 10 different chatbots then what will the significance or if you have 10 different NLU models 1 for each then still? if you want you can create the different 10 images of rasa and rasa-sdk with 10 different ports address for rasa and rasa -action and even API calls, I am not sure will that be the ideal scenario for your use-case, try do some experiment while using 2 or 3 chatbots and its NLU model. Do let me know.

We need only Rasa NLU API. Let’s say i have 10 different NLU trained models in the server. Now, I run them at different ports using -

rasa run --enable-api --model models/model-1.tar.gz --port 8080
rasa run --enable-api --model models/model-2.tar.gz --port 8081
...

Will this be the right approach given we don’t need Rasa core or action sever or Rasa X. Will there be any performance issues or any other conflicts between them ?

Hello @abhishek-eltropy , What you are suggesting is basically running multiple application servers on one instance(virtual or physical), while this works just fine but the i would suggest something of a distributed architecture like k8s and an API gateway to make it work operationally otherwise you will end up choking your instance.

Another option is the pythonic way, i basically use the Rasa’s pythonic API to load models onto one application server and use LRU cache mechanisms given the models i am loading are generally quite small(<30mb) and use a horizontal scaling approach to scale out my API server based on incoming load and a Load Balancer in front to handle routes

Not really clear with Rasa’s pythonic API? What exactly do you mean by that? @souvikg10

When you install rasa using pip install rasa in your python project, you can still access rasa’s code by importing it into your python project.

This is of course different because now you are reponsible to manage the server using your own methods but import rasa’s code and use the methods for training and predictions

@souvikg10 Rasa’s pythonic API seems to be interesting. Could you share example or github link for same ? i want to use multiple small models of RASA in single server deployment. TIA!!!

1 Like