Using two models in parallel on Rasa Server


I am using Rasa NLU and I came across an interesting problem. I have two different websites that are sending sentences for analysis to the same Rasa server. These two websites are using two different Rasa models.

As I’ve understood, I first have to load model to Rasa server and then use POST /model/parse endpoint, there is no way to specify model in request? I am asking because my script is sending a big number of sentences one after another to Rasa server for analysis, and this takes some time, so if during this analysis someone from second website runs the same script, different model will be loaded and the first script won’t use correct model for analysing sentences.

Is there another way to solve this without running two different servers?


Just so I understand. Why do you have two different Rasa models? Is this because you have two language or is there another use-case?

It is another use case. Those two models are trained on different kind of data.

So you have two different assistants?

I am only using NLU part of Rasa, not fully as chatbot. I train 2 or more models which can have different types of intents. So basically each model is for different use case.

Since these models are on the same server, as I’ve explained above, there is a problem if users from 2 websites want to get predictions of large number of sentences at the same time, because only one model can be loaded on server. So if first user loads his model and sends sentences for prediction (and this takes some time), and few seconds later other user loads his model, then the latter will also be used for predicting remaining sentences from the first user.

It can also happen that one website has multiple users and multiple models, and similar situation can happen here as well. Using multiple servers doesn’t sound like an ideal solution, since there is no fixed number of models per website.

Maybe this sounds a bit confusing because it’s a bit different use case than typical Rasa chatbot. :slight_smile:

I see. I think we support this and the docs for this are found here.

Technically you could have a proxy on the server, which routes traffic internally to two models that you’re hosting. You can tell rasa run to run a specific model.

rasa run --enable-api -m models/<model-a>.tar.gz --port 12345
rasa run --enable-api -m models/<model-b>.tar.gz --port 12346

Great, thanks a lot!

1 Like

Hi @koaning I have a question please.

I build my rasa actions with two languages: french and english and I have handled the conversation using the action file and I have just one model for those languages.

Could you please tell me an efficient way to achieve that ? Like providing me the structure I have to follow to achieve it. Thank You in advance

Hi @pitoulambert,

could you expand on what you’re trying to achieve? It’s unclear to me what you mean with “handled the conversation using the action file”. Do you have a single NLU model that triggers actions which in turn sends text back in the correct language?

In general, multi-language assistants are very tricky. It’s not just the language that’s often different, it’s culture too. That means that you might need a different assistant because the intents differ across languages too. A relatively common pattern that I hear from the community is to first ask the user what language they want to use. Users are then sent to an assistant trained on that language. Would that work for you?

So sorry for the delay, this is what I have achieved so far:

  • I have one nlu file that contains both English and French intent
  • Same go with the rules and domains . All my data are in the same file but different languages
  • In my actions also , I handled for the English part and the French part
  • I let the bot start the conversation so that the user can choose either french or English and the rules/stories are called depending on the language.

I have already achieved the above one.

Now I would like to know if there is another method to deal with . or may be in my configuration file. I have to make it as production , but before that I want to make it robust

1 Like

You might enjoy this talk from N26. They talk about how they designed their multi-language system.

Thank you . will go for it

@koaning can i run a multi-assistant*to handle different cases * on the same server on gcp ??

Nothing is stopping you from running two rasa services on two ports on a single machine.

# start process one
rasa run --enable-api --port 5005
# start process two
rasa run --enable-api --port 5006

You may get constraints on resources though, so it may be better to consider Kubernetes at some point.