Low performance for concurrent users


In the post Is rasa server running in multi-threaded way? - #4 by akelad and certain other posts, it was mentioned rasa should be able to handle up to 20 requests per second (~ 250 concurrent users). We deployed a rasa server and after testing with locust, we observed the performance below:

Peak concurrent users: | Spawn Rate: | Avg. response time:

  • 5 | 5 | 0.5 s
  • 10 | 10 | 1 s
  • 20 | 20 | 2s

Which is less than desired. We are using rasa open source and the DistilBERT language model and deploy using docker+k8s. In another post, I got to know about changing the environment variable SANIC_WORKERS can help with multithreading, so I tested (on my local machine) using that with the available cores I had but there was no improvement in performance (After running with and without the env variable and comparing response times). It was mentioned in the documentation that using SANIC_WORKERS can only work in parallel with a RedisLockStore but as we are only using the NLU part of rasa, we omitted that.

Note: For setting the environment variable, I ran the docker container with the following command (for testing, I deploy it on my local machine; 4 threads, 12 GB ram; but the main server we deploy is a lot better):

docker run -d -p 5005:5005 -e SANIC_WORKERS=2 rasa

Performance on my local machine: Peak concurrent users: | Spawn Rate: | Avg. response time:

  • 5 | 5 | 0.7 s
  • 10 | 10 | 1.2 s
  • 20 | 20 | 2.3s

Any advice regarding this would be greatly appreciated and thank you for your time. Regards,

Mohammed Ali