Rasa Core scaling

Hello Rasa Community,

We have a pretty large Rasa Core Community v1.5.2 chatbot (600 intents and over 1000 actions) with a rasa-webchat web interface. LockStore and TrackerStore are in Redis. We also have a test that enables us to go through one of the shorter branches in the dialogue tree (about 10 questions).

This load testing has determined that the chatbot can’t serve more than 10 simultaneous connections. When we increase the number of connections over 10, some of the connections can’t reach the next question, and some have a very long response delay (over 5 seconds).

Setting SANIC_WORKERS, or external balancing of http sessions haven’t yielded any improvements. Sanic developers recommend using an external balancer but this didn’t help us. We tested the routing on k8 and heroku, and can confirm that it works. However, using both platforms we can’t get connections to scale past 10.

If you have ideas on what we can do to test/debug this issue further, it would be much appreciated.

Cheers, Dima

1 Like

Hi @twistcraft, welcome to the forum! It would be great if you could answer a few questions so we can better understand your setup.

  • How did you deploy your assistant - kubernetes, openshift, docker compose?
  • What load balancer did you go for in the end? Though if you only have a single rasa-production service (replication of 1) running, this should not matter.
  • Does the assistance call any external APIs when serving users?
  • Does the assistant call an action server?
  • How many physical nodes do you have running all containers?
  • What load testing framework are you using?
  • Is the load test run from the same machine that’s running the containers?

Thanks

Hi @ricwo and thanks for your questions.

We are deploying using kubernetes, and using k8 for load balancing.
We aren’t calling any external APIs.
The assistant does call an action server a fair bit.
There are 2 nodes for the bot, one for actions, and one for web chat.
JMeter is used for load testing, and it does not run on machines containing the containers.

Thanks

Thanks! How much RAM and CPU did you assign to the node running the bot? We’re reaching message rates of roughly 25/s/pod, but this is with significantly smaller models.

Rasa is designed to run as a replicated service, so you can replicate the rasa-production pod a number of times, which will increase the load it can handle (here it’s important that you’re using RedisLockStore). You might want to replicate your action server as well, although this probably needs fewer replicas than rasa-production.

Please let us know how that goes!

@ricwo We are running N1 and N2 pods. Increasing the amount of CPU and RAM does not improve the number of sessions that can be handled.

As I’ve explained in my question, we are running replicas, and we have verified the routing. Despite that, the number of concurrent sessions handled doesn’t scale as it should. Do you have any suggestions on how this can be debugged (given that scaling and routing are working as they should)?

Are there examples of scalable deploys that you can reference?

Thanks for the clarification. How many replicas are you running? Do you use the same replication for the action server?

It would be great if you could check your rasa-production service logs to see if you get any errors. Do any of the requests sent by jquery come back as unsuccessful?

Are there examples of scalable deploys that you can reference?

Are you only using Rasa Open Source in your setup, or Rasa X as well? Rasa X ships with a set of Helm charts for kubernetes and openshift deployment. Those charts would be a good starting point for a scalable setup. Please check out the docs here: https://rasa.com/docs/rasa-x/installation-and-setup/openshift-kubernetes/#requirements. The Requirements section contains a link to the Helm charts. Please make sure you use export RASA_X_VERSION=0.24.1 for the latest version.

@ricwo Sorry for the delayed response.

Several pods for Rasa core and for actions. I’ve reviewed the logs and the only relevant thing I could find is the following:

combined-bot 35.224.52.62 User Error combined-new[combined-bot-56549977bf-cscmx] Exception occurred while handling uri: ‘http://[IP-ADDRESS]:8443/setup/eureka_info’
combined-bot 35.224.52.62 User Error combined-new[combined-bot-56549977bf-cscmx] Traceback (most recent call last):
combined-bot 35.224.52.62 User Error combined-new[combined-bot-56549977bf-cscmx] File “/build/lib/python3.6/site-packages/sanic/app.py”, line 920, in handle_request
combined-bot 35.224.52.62 User Error combined-new[combined-bot-56549977bf-cscmx] handler, args, kwargs, uri = self.router.get(request)
combined-bot 35.224.52.62 User Error combined-new[combined-bot-56549977bf-cscmx] File “/build/lib/python3.6/site-packages/sanic/router.py”, line 407, in get
combined-bot 35.224.52.62 User Error combined-new[combined-bot-56549977bf-cscmx] return self._get(request.path, request.method, “”)
combined-bot 35.224.52.62 User Error combined-new[combined-bot-56549977bf-cscmx] File “/build/lib/python3.6/site-packages/sanic/router.py”, line 470, in _get
combined-bot 35.224.52.62 User Error combined-new[combined-bot-56549977bf-cscmx] raise NotFound(“Requested URL {} not found”.format(url))
combined-bot 35.224.52.62 User Error combined-new[combined-bot-56549977bf-cscmx] sanic.exceptions.NotFound: Requested URL /setup/eureka_info not found

We reviewed the Helm templates provided. It would be very helpful if you could explain a bit more about what’s there. We have a reasonably standard app architecture that consists of the following services: rasa core, actions, redis, and front end. Helm template referenced is a lot more sophisticated, so it would be awesome if you could explain the similarities between what you provided and what we have (i.e. which containers are equivalent to what we need, and do we need anything extra to get them to working together, etc.) allowing us to use this template. Ideally, we want to be able to just use our containers without re-architecting the whole application.

Thanks in advance

The helm chart you found (now open-sourced at GitHub - RasaHQ/rasa-x-helm: Rasa X Helm chart for deploying on Kubernetes (K8s) and OpenShift.) sets up Rasa X - see Improve your contextual assistant with Rasa X. Rasa and a bunch of other services run in the same cluster. Here’s roughly what they’re for:

Rasa uses a message broker (e.g. RabbitMQ) to forward all conversation events (user messages, bot messages, action executed events, …) to Rasa X. The event-service takes these events and stores them in the Rasa X database.

Feel free to share your k8s config and I’ll be happy to have a look.

It looks like you’re trying to connect to a Google device and that connection fails - does this happen in your action server? Could you try to eliminate that request to see if that’s slowing the system down?

Hey! So I’ve tried deploying a Rasa X cluster on AWS EKS and we’re expected an initial load of around 2000 users that’ll last for around half a day followed by a more scattered distribution. We load tested Rasa using https://k6.io on a static server and found that only about 20% of the requests succeeded.Is there any specific configuration that is kind of battle tested and worth looking into?

Architecture: Cluster: AWS EKS Fargate pods: Duckling, Nginx, Rasa production

Managed nodes( 2vCPU, 4Gb): RasaX, RabbitMQ Managed services: AWS RDS and AWS Elasticache Redis

Rasa OS version:2.8.1 Rasa X : 0.42.0

@thundersparkf unfortunately, 2 years ago helm was an absolute mess, and got us nowhere. please update this thread if you succeed. cheers

3 Likes

@ricwo is rasa scalable past 10 simultaneous users?