Rasa HTTP Server Performance Problem

Hi,

We have Rasa v1.1.5 + duckling, running on AWS with HTTP API enabled. If there are no/light requests, it takes around 200 milliseconds, to return a response, but when there’s a load (say 50 parallel requests) the response time increases dramatically and takes 6 seconds on average.

I’ve seen somewhere that in the new version of Rasa, there’s a new HTTP server, which works better with the parallel processing, but comparing with my other server, which runs rasa 0.14, there’s no big difference.

Can someone advise how can I tune the rasa HTTP server to get better performance?

P.S. currently I’m running several instances behind a load balancer and that solves the problem, but I believe serving 50-100 simultaneous requests should not be a problem for single server too.

50-100 simultaneous requests should be fine. How big is your model?

Trained model is around 8MB and ~ 150 intents