Benchmark Rasa Performance

Has anyone done any stress-testing against the Rasa? Do you guys mind to share the result of your stress-testing? I am wondering how many concurrent “active” users are able to talk to the chatbot through the Rasa.

Hi ChiaJun,

I’ve done a small test to see how the server copes with stress and I don’t have good results to share. But regardless, here it is:

image

You can see here that the message response time per message increases with the number of messages that the user has sent. This is probably because of the tracker store and looks like it scales linearly with the size of conversation history.

For small conversation history of 10 messages, response time was as low as 60ms but it increases to 300ms for history of 700+ messages.

This test was done in JupyterLab so the tracker store used was InMemory. You can expect worse results when databases are used. (In fact, initially, I tested with a script and the session timed out for 800 messages for 2 users for a connection timeout time of 1 hour.) Also, almost all of the actions run on the action server which may have contributed to a little bit of time here.

So, to answer your question, it can probably support between 3-15 active users based on their conversation length.

Here is the sample code that was run:

import time
from rasa.core.agent import Agent

data = [
    # messages go here
    # had 848 messages here
]
data_counts = [i*10 for i in range(1, 80, 3)]

for message_count in data_counts:
    # create new agent to reset tracker store
    agent = Agent.load("models/model.tar.gz")
    
    start_time = time.time()
    for message in data[:message_count]:
        try:
            response = await agent.handle_message(message);
        except:
            pass
    end_time = time.time()

    print("message count:", message_count)
    print("total time:", end_time - start_time)
    print("time per message:", (end_time - start_time) / message_count)
    
    del agent

Hope that helps.

Edit: Looks like the performance deteriorates > 10x when using mongo tracker store in comparison to InMemory tracker store. My custom actions run ~3ms with InMemory and ~55ms for Mongo tracker store.

2 Likes

Hi, Lahsuk:

Thanks for sharing. This is my policy settings: policies: - name: KerasPolicy epochs: 300 max_history: 3 - name: AugmentedMemoizationPolicy max_history: 3 - name: FallbackPolicy nlu_threshold: 0.6 core_threshold: 0.3 - name: FormPolicy - name: MappingPolicy

I only keep a conversation history of 3. Does the tracker store keeps all history even if max_history=3? If it stores all history data and won’t use all history data for next action prediction, which doesn’t make sense, why does the performance decreases quickly?

In my case, I simply take a file of 10,000 sentences as input, and process sentence one by one without concurrency , after it processes about 500 sentences, the speed decreases quickly. Is this similar to multiple active users to talk to the bot simultaneously?

Hi @twittmin,

I don’t think the history size effects performance (it only effects prediction).

The numbers of active users it can support depends on their conversation length. So, for very short conversations it can support 20+ users (depends on other factors too including custom actions) and for long conversations, it will struggle to support even single user.

And no, bombarding it with 10,000 sentences is not similar to multiple users talking to the bot simultaneously. It is just a single user conversing with the bot.

And, looks like other people have faced this issue too:

Hi, @lahsuk, Since it is not used for prediction, why does it slow down so much? Storage wise, it should only increase memory usage, right? My observation is that memory stay almost constant, compared with its size.

Since it seems a lot of people are using rasa to build bots, there should be way to overcome the problem.

This event broker may be helpful: