I am new to Rasa and have just done a POC, We are very excited to take it forward and build full application around it. Before we do that, we have some very fundamental questions around Scalability and Recovery. Answer to these will help us architect a solution better:
- How does rasa bot service scale across 100s or 1000s of parallel conversations:
* Can we run many instances Rasa behind a load balancer to scale?
* Should we build a gateway service that will play the role of creating a sticky session between user and one of many bot instances?
- In the event the bot service holding conversation with the user goes down, Is there a way to recover from this scenario, ie Is there a way to continue conversation on another bot service?
- Is there a way to replicate/send conversation state and slot for a conversation id to continue on a different bot instance to continue conversation with user?
Hi @sundeep_misra, welcome to the forum!
How does rasa bot service scale across 100s or 1000s of parallel conversations?
That will depend on how active your concurrent users are. We’ve measured that a single, non-replicated Rasa instances can handle around 20 messages / second.
Can we run many instances Rasa behind a load balancer to scale?
Yes, Rasa is built to run as a scalable service, so you can replicate the
rasa-production containers behind behind your load balancer.
Should we build a gateway service that will play the role of creating a sticky session between user and one of many bot instances?
This isn’t necessary: We’ve recently introduced a ticket lock mechanism which ensures conversations are locked at the time of processing and incoming messages are dealt with in the right order, regardless of which of your replicas receives it. It’s called the
RedisLockStore and you can check out the docs here.
In the event the bot service holding conversation with the user goes down…
If an instance handling a user conversation goes down, your container orchestrator just won’t send any more messages to that instance. Another instance will then receive the next message and pick up the conversation where it left off. Any message that was already being processed (as opposed to having been queued and waiting to be processed) while your bot service fails will be lost though.
Is there a way to replicate/send conversation state and slot for a conversation id to continue on a different bot instance to continue conversation with user?
As said in the previous answer, that won’t be necessary. The state of the conversation is persisted to database, so you won’t have to share the conversation state between instances.
I hope that helps!
Thanks, this is very helpful. I have one more question:
I will be in a stuation with multiple domain bots, I dont want to be in a situation where i deploy separate instance of bot on separate server. Is there a way to have multiple domain bots in one server and have ability to invoke bot by domain from one sever url?
Hey @ricwo, thank you for this thorough answer. I have a follow up question:
if we use Redis as the TrackerStore or the LockStore, what will happen if Redis loses data ? The store will not be functioning?
By experimenting, I found that by only using a Redis Tracker store, the failover is taken:
I started two bot instances
b2 and they are listening to port
p2. In the middle of the conversation with
b1, I killed it and tried to continue the conversation with
b2 and it works! Is this expected? If yes, what is the use of the
LockStore? I am quite confused by only
looking at the doc: https://rasa.com/docs/rasa/api/lock-stores
@sundeep_misra no such routing is possible out of the box within one server at the moment
Hey @ricwo Do you know about any updates on this? Appreciate any pointer
@ricwo any updates on scalability of Rasa (in particular v.1 vs v2) would be very much appreciated