Improve response time of rasa

I’m using Docker Compose Installation of rasa-x 0.38.1 and rasa 2.4.3. I tested the response time of my bot with JMeter. I sent multiple “help” requests in parallel with 1 second ramp-up period. I ran my bot on three different machines for this test. You can see the results in the following three tables:

It seems rasa is not able to handle concurrent users very well. As the number of simultaneous users grows, the performance decreases. I thought that the bottleneck of the NLU component might be to blame for the problem. Now, I ran these tests again, but this time I directly sent the “help” message to the rasa-worker container. You can see the results in the following three tables:

Given these results, it seems that the NLU component is not entirely to blame. What do you think is the reason? Is there a way to improve response time?

It also seems that there is no caching for the NLU result. I am right? In all these experiments, the active model was fixed and a fixed help message was sent. If there was a cache, after receiving each help message, only for the first message we have to apply NLU and use predicted intent for others. It may be that way, and I’m wrong.