Recovery and Scalability for Rasa Bot Service(s)

sundeep_misra · October 18, 2019, 1:28pm

I am new to Rasa and have just done a POC, We are very excited to take it forward and build full application around it. Before we do that, we have some very fundamental questions around Scalability and Recovery. Answer to these will help us architect a solution better:

Scalability

How does rasa bot service scale across 100s or 1000s of parallel conversations: * Can we run many instances Rasa behind a load balancer to scale? * Should we build a gateway service that will play the role of creating a sticky session between user and one of many bot instances?

Recovery

In the event the bot service holding conversation with the user goes down, Is there a way to recover from this scenario, ie Is there a way to continue conversation on another bot service?
Is there a way to replicate/send conversation state and slot for a conversation id to continue on a different bot instance to continue conversation with user?

ricwo · October 21, 2019, 8:03pm

Hi @sundeep_misra, welcome to the forum!

How does rasa bot service scale across 100s or 1000s of parallel conversations?

That will depend on how active your concurrent users are. We’ve measured that a single, non-replicated Rasa instances can handle around 20 messages / second.

Can we run many instances Rasa behind a load balancer to scale?

Yes, Rasa is built to run as a scalable service, so you can replicate the rasa-production containers behind behind your load balancer.

Should we build a gateway service that will play the role of creating a sticky session between user and one of many bot instances?

This isn’t necessary: We’ve recently introduced a ticket lock mechanism which ensures conversations are locked at the time of processing and incoming messages are dealt with in the right order, regardless of which of your replicas receives it. It’s called the RedisLockStore and you can check out the docs here.

In the event the bot service holding conversation with the user goes down…

If an instance handling a user conversation goes down, your container orchestrator just won’t send any more messages to that instance. Another instance will then receive the next message and pick up the conversation where it left off. Any message that was already being processed (as opposed to having been queued and waiting to be processed) while your bot service fails will be lost though.

Is there a way to replicate/send conversation state and slot for a conversation id to continue on a different bot instance to continue conversation with user?

As said in the previous answer, that won’t be necessary. The state of the conversation is persisted to database, so you won’t have to share the conversation state between instances.

I hope that helps!

sundeep_misra · October 21, 2019, 9:30pm

@ricwo,

Thanks, this is very helpful. I have one more question:

I will be in a stuation with multiple domain bots, I dont want to be in a situation where i deploy separate instance of bot on separate server. Is there a way to have multiple domain bots in one server and have ability to invoke bot by domain from one sever url?

Thanks Sundeep

luofanghao · October 29, 2019, 9:18pm

Hey @ricwo, thank you for this thorough answer. I have a follow up question:

if we use Redis as the TrackerStore or the LockStore, what will happen if Redis loses data ? The store will not be functioning?
By experimenting, I found that by only using a Redis Tracker store, the failover is taken: I started two bot instances b1 and b2 and they are listening to port p1 and p2. In the middle of the conversation with b1, I killed it and tried to continue the conversation with b2 and it works! Is this expected? If yes, what is the use of the LockStore? I am quite confused by only looking at the doc: https://rasa.com/docs/rasa/api/lock-stores

ricwo · October 31, 2019, 1:29pm

Can you specify what you mean? If redis doesn’t work neither the RedisTrackerStore nor the RedisLockStore will work.
Yes that is expected - it doesn’t matter where you continue your conversation. You can send message 1 to instance A, message 2 to instance B and so on - the lock store ensure they’re processed in order. As I mentioned, the ticket lock is a mechanism which ensures conversations are locked at the time of processing and incoming messages are dealt with in the right order, regardless of which of your replicas receives it. The LockStore just holds and manages these ticket locks across multiple instances using Redis as a persistence layer.

ricwo · October 31, 2019, 1:33pm

@sundeep_misra no such routing is possible out of the box within one server at the moment

luofanghao · November 1, 2019, 2:40pm

yes. Redis will loss data if you try to persist them. Just want to make sure if this is handled by Rasa. Seems like no.
I see. Understanded what is the lockStore. Thanks!

SwenSchaeferjohann · February 3, 2021, 10:42am

Hey @ricwo Do you know about any updates on this? Appreciate any pointer

marianagrigg · March 19, 2021, 6:30pm

@ricwo any updates on scalability of Rasa (in particular v.1 vs v2) would be very much appreciated

Topic		Replies	Views
Concurrency in Rasa & multiple Core Usage Rasa Open Source	5	1406	April 29, 2021
High Availability - How to maintain session between multiple instance of Rasa Rasa Open Source	1	567	February 18, 2020
Is Rasa good enough for a bot platform? Rasa Open Source	5	2070	March 12, 2020
Have Multiple Agents on a Web Server Rasa Open Source	8	2557	September 5, 2018
RASA multiple bots on same server Rasa Open Source	14	4674	October 6, 2020

Recovery and Scalability for Rasa Bot Service(s)

Related topics