Rasa Open Source Deployment on AWS server with Multi Core

Vin · May 18, 2022, 7:18am

Hi,

I have a Rasa Open source model deployed using docker compose on an AWS server with 32 cores.

Versions used : Rasa Open source image - rasa/rasa:2.8.14-full Rasa Action Server image - rasa/rasa-sdk:2.8.4

I’ve also added a mongo-db tracker store to the docker-compose.yml and they are working correctly with a simple custom connector.

I want to know if there is there is a way to run rasa in order to utilize all the cores ? In my testing with a large user count and each with 20~30 msgs, I see the avg. response time of my model increases significantly.

Testing Scenario used, simulating 100 users in parallel, each with around 30 messages. The avg. response time is almost 15~16 seconds which is way too high.

On monitoring the CPU usage of the 32 cores and also the docker container stats, I see that the bottleneck is the “Rasa” container running at ~100% cpu usage ie. only a single core is being utilized. (both the action-server and mongodb containers are always less than 5% cpu usage)

Any suggestions on how I should run this in order to utilize all the cpu cores and bring down the response time to 1~1.5 seconds ? Is this not the correct deployment method ? And what is the general Rasa open source only deployment option for high traffic chatbots ?

Any help is much appreciated.

Topic		Replies	Views
Rasa server 500 concurrent users Rasa Open Source	9	2702	September 9, 2021
Very slow http response for model parsing Rasa Open Source	2	455	May 31, 2020
Rasa opensource performance on K8s Rasa Open Source	1	393	September 21, 2022
Lead Generation Rasa AI bot Rasa Open Source	2	69	July 2, 2024
Rasa deployment - docker-compose method Rasa Open Source	9	1136	January 18, 2022

Rasa Open Source Deployment on AWS server with Multi Core

Related topics