Hi, I have a few questions with RASA, 1.How many concurrent users can be handled by RASA at a time? 2.Is there any limitation for that concurrecy?if yes then please mention the number 3.Is there any other limitation in RASA?
This is a hard question to answer because “it depends”.
Here’s some things from the top of my head that matter;
- Your compute backend. If you’re running on Kubernetes then you can use containers to scale and this will have a large impact on where your bottlenecks are.
- Your NLU pipeline. If you’re running very heavy BERT-style models then the latency between responses can increase and this is CPU bound.
- Your retrieval layer. If your assistant needs to send complex queries to your databases then this can also slow down the assistant. Rasa uses an asynchronous system internally (sanic) that can help mitigate this but again “it depends”.
That said, Rasa is used by many large companies at are running at scale. You can see some examples on our showcase page.