I am testing the performance of a rasa server to handle 500 concurrent user. In order to do that, I have deployed rasa on a VM using GCP and a docker-compose configuration. Also, to simulate user requests I wrote an script using locust to post messages on the rest channel. So, I want to share my results and know your thoughts about this topic.
First of all, the vm has the following characteristics: 6cpu and 16Gb ram. The setup was using a docker-compose configuration to deploy a rasa os container and a nginx container to handle the incoming request as a load balancer, and expose the port.
My first experiment was using 100 users and one rasa container.
In this experiment, the average response was near to 4s, and the cpu usage of the container rises to 200%.
Son, in the next experiment the rasa container was scaled to 5 and the users to 500 obtaining the following results.
The average response time was reduce to less than 2s, and the cpu usage decrease to around 130% by container. But the fails increased to 18% and the overall performance is shown in the next image.
The time response stay less than 2s and without fails. So I establish that this vm can support up to 400 concurrent users. This experiments doesn’t consider a custom actions server, a tracker store db or redid server. The purpose of this experiments was only to determinate the number of concurrentes users on a single vm.
If you have any suggestion or consider that I made some mistake feel free to share your ideas.
@darich10, thanks for the helpful post. We have productionised our chatbot on a single vm, it had no issues for less than 200 users. But now since the number of users have spiked to 500+, there is a lot of delay in the bot’s respons (>1min), and usually this delay is backed up by the below error
rasa.core.lock_store.LockError: Could not acquire lock for conversation_id ’
We have a inMemoryLockerstore. Please could you provide some suggestion.
@triinity2221 Hello and thanks for reading. It is not recommendable to use inMemoryLockerstore for production enviroment. Instead, use RedisTrackerStore or SQLTrackerStore.
Also, can you share more information about your deployment, like num of cpu’s, ram and num. of containers?
@darich10 Thanks a lot for sharing this. I’m also currently trying to figure out how many concurrent users my bot could handle, so I’m glad that I saw your post.
I followed the basic instructions on how to set up rasa x on a server with docker-compose, albeit I also have a custom action server. As I understand, these instructions lead to a single rasa container. I was therefore wondering what you mean with running multiple rasa containers and how you did that. I would appreciate your help!
Hello @annaf, in order to scale a service using docekr-compose just set the flag --scale with the up command just as follows: sudo docker-compose up --scale SERVICE=NUM. Where SERVICE is the container to scale and NUM the number of replicates of the container. In this case the service to scale should be “rasa-production”. And to check if the replicates are working, you can use docker stats command.
Also, if you followed the instructions in rasaX for deploying in docker.compose, maybe you need to setup a database outside the VM.
HI, thanks for sharing the stress test for rasa. I also used LOCUST to test qps of rasa service, but I got a very low result and I didn’t know the reason, could you please share your locust file?
here is my result with 2cpu and 8G ram