Need details on multi-thread architecture of rasa server

rssrivast · March 17, 2021, 11:36am

Hi, I was reading about capacity of rasa server to handle requests per second and this post Is rasa server running in multi-threaded way? says that rasa server can handle 20 requests per second. So I have following questions:-

Is there any official architecture documentations and diagram exists that I can refer to get insights into how it can support 20 requests in parallel
Which tools I can use to generate progressive load to find out its capacity to handle requests per second so that I can determine its scaling need? Any details how to hook up that tool with rasa server is appreciated. We are planning to deploy rasa bot for within my team to use as starting point before rolling out for larger user base. Thanks Rahul Srivastava

sipvoip · March 17, 2021, 11:55am

We also ran into this problem, it is quite surprising that rasa has such low throughput, we get around it by using Docker and Kubernetes with lots of pods. As far as load tools if you have a SIP interface you can use something open source like sipp or something commercial such as StarTrinity SipTester. We test up to 1000 sessions using audio recordings of a happy path. If your text-only I think you will need to build something.

rudi · March 17, 2021, 12:34pm

Rasa runs using python’s asyncio framework asyncio — Asynchronous I/O — Python 3.9.2 documentation , which allows it to context switch while waiting for IO
The easiest is if you have a channel that is accessed over HTTP, then you can use one of the multitude of HTTP load testing tools. Here’s an example of a benchmark that I set up: turn-rasa-connector/benchmark at develop · praekeltfoundation/turn-rasa-connector · GitHub

coredrive · March 17, 2021, 12:37pm

We used Locust-based HTTP Load testing (see example here GitHub - nprovotorov/rasa-rest-api-loadtest: Basic tool (based on Locust.io) to perform load test of Rasa based chatbot via REST API.) - maybe it will help you

IRBraga · March 17, 2021, 1:16pm

Hello @rssrivast ! How many workers are you using to start your server? I was having performance problems with the web server used by rasa (Sanic). It was handling 225 requests from 45 threads, each asking the bot 5 questions and the response worst time was nearly bellow 5s.

After reading a lot about Sanic and the rasa lock_store and started a server with 2 workers (instead of the default 1) and the performance surprised me. I was able to handle more than 5000 request from 1000 threads with response time bellow 4s.

In order to configure the lock_store you need to have a redis server instance and configure it on your endpoints.yml file, here are the docs.

Here is my endpoint.yml

lock_store:
    type: redis
    url: localhost
    port: 6379
    password: 
    db: 0
    key_prefix: 
    use_ssl: False
    socket_timeout: 10

Here what Sanic’s documentation says about workers.

To have more workers on rasa using Sanic, you should have a env variable called SANIC_WORKERS (not in rasa documentation), with the numbers of workers you want. Please, limit the amount of workers based on the available cores you have as described on documentation (link above).

Hope this helps you!

P.S.: Only worked using Linux, on Windows machine it raises an Error.

rssrivast · March 17, 2021, 3:19pm

Thanks for all the details. This is very useful. I will update with my findings after I will implement. Rahul Srivastava

rssrivast · March 17, 2021, 9:53pm

I have one more question which is even if I start multiple rasa server instances the bottleneck is still single threaded call between “rasa server” to “rasa action server”. Our “rasa action server” is making a blocking call to external APIs and it does not handle any other incoming calls until the request succeeds or finally times out. All the other requests have to wait until this. How we can make “rasa action server” non-blocking to other incoming calls even if the previous call is still active?

IgNoRaNt23 · March 18, 2021, 8:38am

You may also start the action-server with multiple workers via env variable ACTION_SERVER_SANIC_WORKERS

Arjaan · March 18, 2021, 3:55pm

@rssrivast ,

A common root cause of poor performance is when the action server is using a synchronous method to call external services, for example with the requests package.

These type of calls block the asyncio logic and all other calls will wait.

Make sure to use async logic everwhere in your custom actions, for example, replace the requests package with the aiohttp package.

rssrivast · March 18, 2021, 7:04pm

Thanks for suggestions. I will try it out.

gagangupt16 · March 25, 2021, 10:08am

@Arjaan Quick question, if we have a custom NLU component in pipeline which calls external service using requests package, should we also use aiohttp package

Arjaan · March 30, 2021, 11:51am

@gagangupt16 , yes, definitely use the aiohttp package, to make it async.

Else your external service calls will block everything else.

Randywreed · April 14, 2021, 2:00pm

@Arjaan I wonder if you guys have any example code that shows this. I’ve tried to do this using aiohttp from within an action and the await command still blocks.

Arjaan · April 14, 2021, 7:31pm

@Randywreed ,

Here is a code snippet that compares using the requests module with the aiohttp module:

import os
import json
import logging
from typing import Any, Text, Dict, List, Union

import requests
import aiohttp
from rasa_sdk import Action, Tracker
from rasa_sdk.forms import FormAction
from rasa_sdk.executor import CollectingDispatcher
from rasa_sdk.events import (
    EventType,
    SlotSet,
    ConversationPaused,
    AllSlotsReset,
    UserUtteranceReverted,
    FollowupAction,
)

logger = logging.getLogger(__name__)
USE_AIOHTTP = False

rasa_x_host = os.environ.get("RASA_X_HOST", "rasa-x:5002")

async def tag_convo(tracker: Tracker, label: Text) -> None:
    """Tag a conversation in Rasa X with a given label"""
    endpoint = f"http://{rasa_x_host}/api/conversations/{tracker.sender_id}/tags"

    logger.debug("Tagging a conversation at url=%s with data=%s", endpoint, label)

    if not USE_AIOHTTP:
        logger.info("using requests module")
        response = requests.post(url=endpoint, data=label)
        logger.info("Response status code: %s", response.status_code)
    else:
        logger.info("using aiohttp module")
        # https://docs.aiohttp.org/en/stable/client_quickstart.html#make-a-request
        async with aiohttp.ClientSession() as session:
            async with session.post(url=endpoint, data=label) as response:
                logger.info("Response status code %s", response.status)

Randywreed · April 14, 2021, 9:09pm

@Arjaan That’s interesting. Does that work in an action? When I use this code which I believe is the same as yours I get an error "AttributeError: aexit_ (full traceback below):
Here’s the code:

async def run(
        self,
        dispatcher: CollectingDispatcher,
        tracker: Tracker,
        domain: Dict[Text, Any],
        ) -> List[Dict[Text, Any]]:    
        import asyncio
        import aiohttp
        cid=tracker.sender_id
        async with aiohttp.ClientSession as session:
            import requests
            import json
            headers = {'Content-Type': 'application/json',}
            params = (('output_channel', 'latest'),)
            url="http://localhost:5005/conversations/"+cid+"/trigger_intent"
            d = {"name" : "EXTERNAL_dry_plant", "entities": {"plant": "Orchid"}}
            print(url,d)   

            async with session.post(url=url,data=d,headers=headers) as resp:
                print (resp.status)
       return[]

Here’s the error:

Exception occurred while handling uri: 'http://localhost:5055/webhook'
Traceback (most recent call last):
  File "/home/randy/.cache/pypoetry/virtualenvs/reminderbot-fAAFfjM8-py3.7/lib/python3.7/site-packages/sanic/app.py", line 938, in handle_request
    response = await response
  File "/home/randy/.cache/pypoetry/virtualenvs/reminderbot-fAAFfjM8-py3.7/lib/python3.7/site-packages/rasa_sdk/endpoint.py", line 103, in webhook
    result = await executor.run(action_call)
  File "/home/randy/.cache/pypoetry/virtualenvs/reminderbot-fAAFfjM8-py3.7/lib/python3.7/site-packages/rasa_sdk/executor.py", line 398, in run
    action(dispatcher, tracker, domain)
  File "/home/randy/.cache/pypoetry/virtualenvs/reminderbot-fAAFfjM8-py3.7/lib/python3.7/site-packages/rasa_sdk/utils.py", line 230, in call_potential_coroutine
    return await coroutine_or_return_value
  File "/home/randy/Rasa/reminderbot/actions/actions.py", line 107, in run
    async with aiohttp.ClientSession as session:
AttributeError: __aexit__

If i put it in its own async function, then it errors saying it needs an await. The only way I’ve been able to get this to work in action is to use requests in a mutiprocessor thread.

ali207715 · November 7, 2021, 5:55pm

Lock stores are used to store conversation history so I presume you are utilizing more than just the NLU? I have a rasa server running where I only access the NLU and tried using the SANIC_WORKER environment variable, but after running locust, there was absolutely no change in performance. Could you share how exactly you were able to set the environment variable to get this performance?

tejabhat · April 10, 2023, 10:50am

Hey, I was trying to use this. but cannot see this variable in the rasa code base 3.2.8 . Is it removed by any chance?

tejabhat · April 10, 2023, 11:15am

Realized that this env variable is in rasa_sdk code where the custom action endpoint spins the given number of threads. Thank you for the info on this variable.

francisford · January 16, 2024, 4:40pm

Hi, we are facing the same problem in my team. Basically, the point is that each request is executed sequentially so we are not getting 20 satisfied requests per second (you can see the response time increasing over time):

We are making a call to an external data server but we tested that solution and it is distributed.

We have tried changing the number of SANIC_WORKERS for both Core and Action servers but the performance doesn’t change.

We even tried to implement the redis solution suggested by @IRBraga (we just started a redis server and we configured the endpoint.yml as indicated in the answer above), but the stress tests don’t show any change in response time.

Do we have to add a configuration file for the redis server or do you suggest something else? Thanks in advance!

Shreetej · July 18, 2024, 6:24am

Hi can you help me with this. How you got it to improve as even with change in this we are not able to handle more than 20 req/sec. Our rasa open source bot is deployed in kubernetes with no limit on resources. The CPU utilization is not going above 50 % to scale but the requests are failing.

Topic		Replies	Views
Rasa Engine's multi-thread question Rasa Open Source	0	16	August 7, 2024
How many request rasa server can process at one time Rasa Open Source	2	889	September 2, 2021
How many parallel requests can Rasa server handle? Rasa Open Source	1	518	December 27, 2023
Is rasa server running in multi-threaded way? Rasa Open Source	8	1957	March 23, 2021
Rasa performance Rasa Open Source	3	1395	September 8, 2019

Need details on multi-thread architecture of rasa server

Related topics