How to have a load balancer on top of multiple rasa open source assistants

I want to have a highly available rasa assistant. Say I have two assistants on two separate machines. How can I have a load balancer on top of them so as to handle incoming requests based on some criterion. I want to able to maintain the context developed between the user and the assistant. It need not be w.r.t. switching between assistants but at least with the same assistant itself.

Could someone please look into this?

hey @singh-l, this is a common and recommended approach. There are a few components to this. I recommend reviewing our architecture diagram for Rasa Open Source as well.

Typically, I’ve seen this done by sticking a user to a particular machine/region once they start in it. This allows you to have full replication across both stacks in case one fails (vs. sharing a tracker store between the two). A con of this approach is your conversations are stored in two separate places… but it’s not a huge deal because entire conversations will be contained in a single DB.

So, you’ll have a tracker store per region that manages the context between the assistant and the user. And then your load balancer is responsible for the stickiness… i.e. you need to route based on something like the IP of the user and make sure they are always sent to the same region.

If one region goes down, a conversation would need to start over in the new region. You could get around this by having a shared tracker store with both regions, but that creates a single point of failure at the Tracker DB. So it’s up to you.

Thanks @desmarchris , this is very helpful