We have Rasa v1.1.5 + duckling, running on AWS with HTTP API enabled. If there are no/light requests, it takes around 200 milliseconds, to return a response, but when there’s a load (say 50 parallel requests) the response time increases dramatically and takes 6 seconds on average.
I’ve seen somewhere that in the new version of Rasa, there’s a new HTTP server, which works better with the parallel processing, but comparing with my other server, which runs rasa 0.14, there’s no big difference.
Can someone advise how can I tune the rasa HTTP server to get better performance?
P.S. currently I’m running several instances behind a load balancer and that solves the problem, but I believe serving 50-100 simultaneous requests should not be a problem for single server too.