What is the best way to handle two models (different languages) on one API?

merveenoyan · June 4, 2021, 2:18pm

Hi there, I’m looking for best practices on working with multiple bots in production (one for one language). Having two instances to handle this doesn’t seem to be scalable, so I’m open for suggestions.

souvikg10 · June 4, 2021, 4:46pm

Why having two instances is not scalable?

souvikg10 · June 4, 2021, 4:52pm

The key is you need a good infrastructure scheduling framework such as openshift or k8s. you can schedule “instance” or pods for each language of the bot and use nginx as a routing engine. Now i am talking only about the Rasa OSS for inference. These instances isnt something i would share with the training jobs which remains a different problem. I use celery to distribute my training for multiple languages in various configs whichever suits best for a given language. and deploy pods per assistant per language.

if you are starting and the project is small, you can use one ec2 machine for example, deploy k3s(a micro k8s framework, also used by Rasa x btw) and deploy your rasa pods on them and expose them via nginx ingress controller. this is really easy to scale as with the project growth, simply schedule the pods on a larger k8s cluster such as EKS or AKS.

This is where your the trendy MLOps comes in and how well you can manage your applicative resources. for infrastructure, you can use terraform to spin up nodes that can be attached to the cluster.

merveenoyan · June 7, 2021, 9:32pm

We’re building this for another language, one instance per language for 10 languages will not be feasible tomorrow that’s why I asked :’)

souvikg10 · June 8, 2021, 7:21am

it depends on what you mean by an instance. i guess you can run several pods on a large enough k8s node and that does not seem to be a problem. my previous project had 3 languages running on a decent k8s cluster.

It also depends a lot on your config for the NLU. are you using pretrained embeddings? which are quite large or simply the DIET self supervised embeddings. in case of the latter the models aren’t really big and you can technically use a pythonic API to load models, cache them and make inferences. I did that as well but only for Rasa NLU, i dont know how that would work for Rasa core. i loaded the pretrained embeddings in the docker image itself.

I personally found no real issue with k8s scheduling rasa pods per language for one project. you can schedule a lot of pods on one node and add auto scaling features to horizontally scale pods based on memory/cpu etc… it works quite well.

Topic		Replies	Views
Whats the best way to implement multilanguage with RasaX Rasa Open Source	1	488	May 12, 2020
RASA NLU as a service for multiple chatbots in same instance Rasa Open Source	3	2958	July 23, 2019
How can I use single Rasa App to support multiple languages and multiple use cases Rasa Open Source	0	385	December 20, 2021
How to scale with multiple bots Rasa Open Source	0	284	February 18, 2021
Building a multi-channel chatbot with Rasa Tutorials, Resources & Videos	16	6203	May 25, 2023

What is the best way to handle two models (different languages) on one API?

Related topics