How to scale Rasa NLU for production?


We’ve been using Rasa in development for a while now for a chatbot service, in Docker Swarm with 1 replica.

However after scaling up Rasa to 2 replicas, when training data, only one of the 2 replicas gets trained (I observed Rasa container logs and able to see that logs only go to either but not both).

In production ideally we’d need to have multiple replicas for Rasa for disaster recovery and high availability - has anyone experienced this issue and could suggest solutions? Is Rasa even scalable?

Many thanks in advance!


Hi @uyenle57, welcome to the forum! Which rasa service exactly did you scale? Are you running this in the full rasa X environment?

Hi, We are using Rasa 1.1.3 Python Python 3.6.8 OS: Linux

We are not using Rasa X. Please refer below link for more detail

Thanks! How do you initiate the training request?

So have you solved your problem yet? Is Rasa scalable? I want to run multiple replica of Rasa to handle 100s conversations at the same time.