I am testing the performance of a rasa server to handle 500 concurrent user. In order to do that, I have deployed rasa on a VM using GCP and a docker-compose configuration. Also, to simulate user requests I wrote an script using locust to post messages on the rest channel. So, I want to share my results and know your thoughts about this topic.
First of all, the vm has the following characteristics: 6cpu and 16Gb ram. The setup was using a docker-compose configuration to deploy a rasa os container and a nginx container to handle the incoming request as a load balancer, and expose the port.
My first experiment was using 100 users and one rasa container.
In this experiment, the average response was near to 4s, and the cpu usage of the container rises to 200%. Son, in the next experiment the rasa container was scaled to 5 and the users to 500 obtaining the following results.
The average response time was reduce to less than 2s, and the cpu usage decrease to around 130% by container. But the fails increased to 18% and the overall performance is shown in the next image.
Using these information I decide to set the numbers of containers to 5 and test with 400 users and get the next results:
If you have any suggestion or consider that I made some mistake feel free to share your ideas.
Thanks for reading.