What is the best way to handle two models (different languages) on one API?

Hi there, I’m looking for best practices on working with multiple bots in production (one for one language). Having two instances to handle this doesn’t seem to be scalable, so I’m open for suggestions.

1 Like

Why having two instances is not scalable?

The key is you need a good infrastructure scheduling framework such as openshift or k8s. you can schedule “instance” or pods for each language of the bot and use nginx as a routing engine. Now i am talking only about the Rasa OSS for inference. These instances isnt something i would share with the training jobs which remains a different problem. I use celery to distribute my training for multiple languages in various configs whichever suits best for a given language. and deploy pods per assistant per language.

if you are starting and the project is small, you can use one ec2 machine for example, deploy k3s(a micro k8s framework, also used by Rasa x btw) and deploy your rasa pods on them and expose them via nginx ingress controller. this is really easy to scale as with the project growth, simply schedule the pods on a larger k8s cluster such as EKS or AKS.

This is where your the trendy MLOps comes in and how well you can manage your applicative resources. for infrastructure, you can use terraform to spin up nodes that can be attached to the cluster.

3 Likes

We’re building this for another language, one instance per language for 10 languages will not be feasible tomorrow that’s why I asked :’)

1 Like

it depends on what you mean by an instance. i guess you can run several pods on a large enough k8s node and that does not seem to be a problem. my previous project had 3 languages running on a decent k8s cluster.

It also depends a lot on your config for the NLU. are you using pretrained embeddings? which are quite large or simply the DIET self supervised embeddings. in case of the latter the models aren’t really big and you can technically use a pythonic API to load models, cache them and make inferences. I did that as well but only for Rasa NLU, i dont know how that would work for Rasa core. i loaded the pretrained embeddings in the docker image itself.

I personally found no real issue with k8s scheduling rasa pods per language for one project. you can schedule a lot of pods on one node and add auto scaling features to horizontally scale pods based on memory/cpu etc… it works quite well.

1 Like