We deployed Rasa-x with the official helm chart on an Azure AKS Cluster and were stress testing with locust. We noticed that a single rasa-production pod facing around 100 simulated simultaneous users would lead to a non-functioning postgresql pod:
2020-06-12 12:01:46.013 GMT [30481] FATAL: sorry, too many clients already
And the production pod would complain similarly:
2020-06-12 12:29:14 WARNING rasa.core.tracker_store - (psycopg2.OperationalError) FATAL: sorry, too many clients already
The locust task would consist of messages send to /api/conversations//messages in an interval between 5 and 9 seconds.
After scaling the production-pod up to 5, we were getting decent response times for a testcase of 110 users. Around 150ms on average at the beginning but slowly increasing to around 600-700ms as the conversation-length got to 100. With increasing conversation length, the response time increased to over a second. I think this has something to do with tracker serialization/deserialization, as mentioned in another post.
Also, we get pod crashes as load increases:
CPU load suggests there is some room left:
How would you go on about this? Is this plainly a scaling issue, meaning that more nodes and rasa-production pods would solve the problem, or is there also a database-based bottleneck?
If we would use a Redis managed service from Azure as the Tracker Store (and Lockstore) combined with more pods, would this solve the problem and how would you configure the values.yml in the helm chart?