Is rasa server running in multi-threaded way?

YoungXu06 · November 13, 2020, 12:39pm

Hi there, I have this problem, when I try to test concurrency of my local rasa server with A/B tool. I set requests to 400 and some of the requests return connection error as pic bellow, but my CPU occupation less than 50%. So I wonder if rasa run with thread occupation error, please help me figure out what’s happening here, THX!

akelad · November 17, 2020, 3:56pm

Hi @YoungXu06, are you testing 400 requests per second? one rasa server can typically handle up to 20 requests per second, if you need more than that, you need to increase the number of rasa servers running. I can’t imagine you need to be able to handle 400 requests per second though.

YoungXu06 · November 19, 2020, 1:52am

Thank you akelad, yes I send 400 requests simultaneously, because we want to deploy rasa chatbot in our company to serve thousands of people. In some circumstances, requests could be huge. Now, we are using docker micro services to put it online and testing where is the bottleneck of our bot.

akelad · November 20, 2020, 11:14am

@YoungXu06, I can pretty much guarantee you’re never going to reach 400 requests per second with that type of user base. If you think about it - 20 requests per second equates to roughly ~250 concurrent users (making some assumptions around how frequently they’re sending messages). We have customers deployed to millions of people, and they’re nowhere near needing to handle 400 requests per second.

I’d suggest deploying the bot to a part of your user base first, to judge how much volume you’ll be getting, and then decide how many rasa pods you need from there. Autoscaling in kubernetes might also be something to consider

YoungXu06 · November 21, 2020, 1:30pm

Awesome! That’s really a relief for us, because we worried a lot about concurrency issue. Will try your suggestions, and much appreciate for this great project and your kind help

akelad · November 23, 2020, 12:55pm

no problem!

YoungXu06 · November 25, 2020, 3:26am

Hey Akelad, we use docker+k8s to deploy multiple rasa servers. When there are only a few rasa nodes, QPS grows roughly linearly, but when the number of nodes exceeds a certain number, QPS growth becomes very small, and adding more nodes, QPS basically does not changed. We checked that the usage of docker cores and memory did not reach the upper limit. In addition to possible I/O issues (we store the sessions in ES), what are the possible reasons for this situation?

rssrivast · March 17, 2021, 5:16am

Hi Akelad, I have also deployed rasa server to be used within my organization. How I can check and validate that architecturally one rasa server has ability to handle up to 20 requests per second. Is there any architecture document related to rasa sever exists that I can refer to to get this information? I appreciate any help. Thanks Rahul

Lizhengo · March 23, 2021, 9:42am

hi, can you share your endpoints.yml? thanks a lot!

Topic		Replies	Views
Multithreading Rasa Open Source	3	1604	January 9, 2019
Need details on multi-thread architecture of rasa server Rasa Open Source	19	3684	July 18, 2024
Rasa server 500 concurrent users Rasa Open Source	9	2672	September 9, 2021
Low performance for concurrent users Rasa Open Source	0	492	November 7, 2021
Rasa is not able to handle more than 4/5 concurrent requests Tutorials, Resources & Videos	14	3412	October 13, 2020

Is rasa server running in multi-threaded way?

Related topics