Is rasa server running in multi-threaded way?

Hi there, I have this problem, when I try to test concurrency of my local rasa server with A/B tool. I set requests to 400 and some of the requests return connection error as pic bellow, but my CPU occupation less than 50%. So I wonder if rasa run with thread occupation error, please help me figure out what’s happening here, THX!

1 Like

Hi @YoungXu06, are you testing 400 requests per second? one rasa server can typically handle up to 20 requests per second, if you need more than that, you need to increase the number of rasa servers running. I can’t imagine you need to be able to handle 400 requests per second though.

1 Like

Thank you akelad, yes I send 400 requests simultaneously, because we want to deploy rasa chatbot in our company to serve thousands of people. In some circumstances, requests could be huge. Now, we are using docker micro services to put it online and testing where is the bottleneck of our bot.

@YoungXu06, I can pretty much guarantee you’re never going to reach 400 requests per second with that type of user base. If you think about it - 20 requests per second equates to roughly ~250 concurrent users (making some assumptions around how frequently they’re sending messages). We have customers deployed to millions of people, and they’re nowhere near needing to handle 400 requests per second.

I’d suggest deploying the bot to a part of your user base first, to judge how much volume you’ll be getting, and then decide how many rasa pods you need from there. Autoscaling in kubernetes might also be something to consider


Awesome! That’s really a relief for us, because we worried a lot about concurrency issue. Will try your suggestions, and much appreciate for this great project and your kind help :100:

no problem!

Hey Akelad, we use docker+k8s to deploy multiple rasa servers. When there are only a few rasa nodes, QPS grows roughly linearly, but when the number of nodes exceeds a certain number, QPS growth becomes very small, and adding more nodes, QPS basically does not changed. We checked that the usage of docker cores and memory did not reach the upper limit. In addition to possible I/O issues (we store the sessions in ES), what are the possible reasons for this situation?

Hi Akelad, I have also deployed rasa server to be used within my organization. How I can check and validate that architecturally one rasa server has ability to handle up to 20 requests per second. Is there any architecture document related to rasa sever exists that I can refer to to get this information? I appreciate any help. Thanks Rahul

1 Like

hi, can you share your endpoints.yml? thanks a lot!