can somebody explain me what is the meaning of running multiple rasa servers in parallel as replicated services?
This line is from INMemoryLockstore configuration and i have below queries
1.In which scenario or use case we need to use Redis LockAStore?
A single Rasa process can only use up to 1 CPU. If you’re running a single Rasa process, then using the built-in in-memory tracker store, and in-memory lock store is fine.
But when you have traffic that is too much for a single instance, you might want to run multiple instances, even across multiple machines, and load balance between them, to be able to handle the load.
In this case, the separate processes cannot see what is in each other’s memory, so you need a central place to store the tracker information, and a central place to coordinate locks. This is where the other tracker stores, and other lock stores (like the Redis lock store), come in.
Thank for explanation! Do you have any documentation or can refer me the link which describes deploying rasa multiple instances on different machines?
There’s documentation for how to deploy: Deploying Your Rasa Assistant , Deploying a Rasa Open Source Assistant in Docker Compose , which contains Helm charts and other resources to be able to deploy.
Deploying across multiple machines will be the same as deploying any other web service across multiple machines. The exact setup should be up to the team that is handling and maintaining the servers.
Thanks for explanation!
I do not want to install RASA on public or private cloud so i would not be using docker or cuber nets. I want to install it on physical server in my company network so in that case how do i install multiple rasa instances and calculate hardware requirement?
@lelemh Running production software is a large topic that I think is out of the scope of discussions around Rasa itself. Managing, monitoring, and maintaining servers is a job that takes a team with specific skills.
Running on a physical server doesn’t mean that you can’t use docker or kubernetes. But I do understand wanting to run things without container orchestration. In that case this is what I would do:
- Have a single VM that acts as a public-facing load balancer
- Have N VMs to run your bot instances. Each VM should run as many bot instances as there are cores in that VM
- Have M VMs running action server instances. Most of the time what the action server is doing is not computationally complex, so a single VM with a single instance is fine, but that depends specifically on your actions server implementation, so requirements may vary. As long as you’re not doing any blocking operations, having as many action server instances as you have cores on the VM would be advisible. If you need multiple actions server instances, then you will probably also want an internal load balancer VM, to load balance across those instances.
- Have a VM running postgreSQL, with possibly an additional VM running postgreSQL in replica mode, for HA. Use this for the tracker store.
- Have a VM running Redis. Use this for the lock store. Maybe a second Redis VM for HA fallover, no need to synchronise data between the two instances though, it’s just used for locking.
- Have a VM running monitoring software, like prometheus + grafana, set up to monitor the software and VMs, and to alert on any issues.
- Only the load balancer VM should have access to the public internet, all other VMs should only be connected to each other on a separate, private network.
- Firewall setup and rules to only allow required traffic
- Access control, both to the servers themselves, and to the various services running on them.
There are probably considerations I’m forgetting about, not working directly in this space, but I think that covers the main considerations.