Rasa Core scaling

Hello Rasa Community,

We have a pretty large Rasa Core Community v1.5.2 chatbot (600 intents and over 1000 actions) with a rasa-webchat web interface. LockStore and TrackerStore are in Redis. We also have a test that enables us to go through one of the shorter branches in the dialogue tree (about 10 questions).

This load testing has determined that the chatbot can’t serve more than 10 simultaneous connections. When we increase the number of connections over 10, some of the connections can’t reach the next question, and some have a very long response delay (over 5 seconds).

Setting SANIC_WORKERS, or external balancing of http sessions haven’t yielded any improvements. Sanic developers recommend using an external balancer but this didn’t help us. We tested the routing on k8 and heroku, and can confirm that it works. However, using both platforms we can’t get connections to scale past 10.

If you have ideas on what we can do to test/debug this issue further, it would be much appreciated.

Cheers, Dima

Hi @twistcraft, welcome to the forum! It would be great if you could answer a few questions so we can better understand your setup.

  • How did you deploy your assistant - kubernetes, openshift, docker compose?
  • What load balancer did you go for in the end? Though if you only have a single rasa-production service (replication of 1) running, this should not matter.
  • Does the assistance call any external APIs when serving users?
  • Does the assistant call an action server?
  • How many physical nodes do you have running all containers?
  • What load testing framework are you using?
  • Is the load test run from the same machine that’s running the containers?

Thanks

Hi @ricwo and thanks for your questions.

We are deploying using kubernetes, and using k8 for load balancing.
We aren’t calling any external APIs.
The assistant does call an action server a fair bit.
There are 2 nodes for the bot, one for actions, and one for web chat.
JMeter is used for load testing, and it does not run on machines containing the containers.

Thanks

Thanks! How much RAM and CPU did you assign to the node running the bot? We’re reaching message rates of roughly 25/s/pod, but this is with significantly smaller models.

Rasa is designed to run as a replicated service, so you can replicate the rasa-production pod a number of times, which will increase the load it can handle (here it’s important that you’re using RedisLockStore). You might want to replicate your action server as well, although this probably needs fewer replicas than rasa-production.

Please let us know how that goes!

@ricwo We are running N1 and N2 pods. Increasing the amount of CPU and RAM does not improve the number of sessions that can be handled.

As I’ve explained in my question, we are running replicas, and we have verified the routing. Despite that, the number of concurrent sessions handled doesn’t scale as it should. Do you have any suggestions on how this can be debugged (given that scaling and routing are working as they should)?

Are there examples of scalable deploys that you can reference?

Thanks for the clarification. How many replicas are you running? Do you use the same replication for the action server?

It would be great if you could check your rasa-production service logs to see if you get any errors. Do any of the requests sent by jquery come back as unsuccessful?

Are there examples of scalable deploys that you can reference?

Are you only using Rasa Open Source in your setup, or Rasa X as well? Rasa X ships with a set of Helm charts for kubernetes and openshift deployment. Those charts would be a good starting point for a scalable setup. Please check out the docs here: Deploy in a Cluster Environment. The Requirements section contains a link to the Helm charts. Please make sure you use export RASA_X_VERSION=0.24.1 for the latest version.

@ricwo Sorry for the delayed response.

Several pods for Rasa core and for actions. I’ve reviewed the logs and the only relevant thing I could find is the following:

combined-bot 35.224.52.62 User Error combined-new[combined-bot-56549977bf-cscmx] Exception occurred while handling uri: ‘http://[IP-ADDRESS]:8443/setup/eureka_info’
combined-bot 35.224.52.62 User Error combined-new[combined-bot-56549977bf-cscmx] Traceback (most recent call last):
combined-bot 35.224.52.62 User Error combined-new[combined-bot-56549977bf-cscmx] File “/build/lib/python3.6/site-packages/sanic/app.py”, line 920, in handle_request
combined-bot 35.224.52.62 User Error combined-new[combined-bot-56549977bf-cscmx] handler, args, kwargs, uri = self.router.get(request)
combined-bot 35.224.52.62 User Error combined-new[combined-bot-56549977bf-cscmx] File “/build/lib/python3.6/site-packages/sanic/router.py”, line 407, in get
combined-bot 35.224.52.62 User Error combined-new[combined-bot-56549977bf-cscmx] return self._get(request.path, request.method, “”)
combined-bot 35.224.52.62 User Error combined-new[combined-bot-56549977bf-cscmx] File “/build/lib/python3.6/site-packages/sanic/router.py”, line 470, in _get
combined-bot 35.224.52.62 User Error combined-new[combined-bot-56549977bf-cscmx] raise NotFound(“Requested URL {} not found”.format(url))
combined-bot 35.224.52.62 User Error combined-new[combined-bot-56549977bf-cscmx] sanic.exceptions.NotFound: Requested URL /setup/eureka_info not found

We reviewed the Helm templates provided. It would be very helpful if you could explain a bit more about what’s there. We have a reasonably standard app architecture that consists of the following services: rasa core, actions, redis, and front end. Helm template referenced is a lot more sophisticated, so it would be awesome if you could explain the similarities between what you provided and what we have (i.e. which containers are equivalent to what we need, and do we need anything extra to get them to working together, etc.) allowing us to use this template. Ideally, we want to be able to just use our containers without re-architecting the whole application.

Thanks in advance