Rasa + AWS Lambda

Hello everyone,

I was wondering if anyone has any suggestion on how to deploy Rasa in AWS. More specifically:

  1. What are the best practices to use Rasa in AWS cloud services?
  2. Can we deploy Rasa in AWS Lambda? I’ve noticed that other people mentioned that it might be currently larger than what can be fitted into a lambda function. So I was wondering if anyone had any experience/workaround/suggestion on how to deploy Rasa in AWS Lambda.
  3. Is there any light version of Rasa that could be fit in Lambda functions, or any guidance to build one.


Best, Behnam

I think Lambda is a great idea. They are stateless which means very cost-effective, but also you’d need to implement one of the persistent data stores for your Tracker implementation, and, almost certainly, the custom one, connecting to DynamoDB. I’ve not seen any code immediately around that does this but it would be very useful.

Also, the default lambda behaviour is that the first request boots up a VM temporarily and keeps it around for about 10 minutes. So the first user of your service (be it human or a bot) will be 6+ seconds. The most efficient solution is almost certainly provisioned concurrency that keeps things warm for you at a slight increase in price. Very cool stuff.

Thanks for the reply. I agree that Lamda is usually a great choice and we can keep it warm to make it faster. However, the problem with deploying Rasa on a lambda is the lambda disk space limitation. The disk space is limited to 512MB and Rasa (at least the default version with no optimization) is way larger than that (~2GB). Therefore, I think Lambda may not be a good choice for Rasa deployment. Please correct me if anyone has any other experiences.

1 Like

Is there any way to cache some of the common conversation patterns so you do not need an actual rasa running, so you can use that time to actually start the rasa container, for eg you could have 0 rasa container running at any time but when you get a request you add container >0 and during the warm-up time you could reply through this caching functionality running on lambda?

Hi, would it be possible to deploy rasa(rasa nlu) in aws lamba as a docker container?

the container image code package size is 10 GB.

https://docs.aws.amazon.com/lambda/latest/dg/gettingstarted-limits.html https://docs.aws.amazon.com/lambda/latest/dg/lambda-images.html

We deploy Rasa on AWS for clients using AWS ECS with Fargate instances. We have also built pieces like AWS Batch training pipeline, S3 Tracker Store and SQS Event Broker so the Rasa instances are fully deployed in managed fashion.

The only advantage we get from deploying on Lambda is cost, but I think a 1core 2g Fargate instance is also manageable in terms of cost if the bot is not that intensive to run, and the service will be much more responsive.

However, I spent some time today reading through the source code of Rasa and the Lambda Runtime API, and I think it is doable, but it will take some hacking.

The main problem is that AWS lambda has its own runtime API and we need to essentially write our own lambda runtime and wrap around Rasa. When Rasa core starts the agent will be loaded and a Sanic server will start. The Sanic instance is likely not that useful in a lambda function, so we will need to bootstrap just the Rasa agent somehow for the lambda runtime. The Rasa agent needs to be warmed up at the time the runtime starts, so we need to expect quite a bit of warm-up time.

AWS has built a lambda runtime client using Python so I will likely hack around both the lambda runtime client and spin up Rasa agent directly.

We are also investigating other serverless offerings, I think if we want to save cost, Google Cloud Run would work since they offer a request time pricing similar to lambda, and it seems relatively easier to set up. I don’t have much experience with GCP so I cannot comment any further.

Our team has been doing open source work around Rasa and we will be doing this lambda experiment later this month and early next month. And hopefully I will get back with some findings then.


I am new to RASA and thinking around the scalability aspects. So problem statement is like " We have multiple clients of different domains. As of now it looks like to me that I have to create separate setup of RASA for each client. Keeping all these aspects in view we are thinking to create a lambda function for each of the clients like you was trying."

Can you please help me with your findings? I think AWS lambda maximum allowed size is 250 mb only. How you are managing it? Also it will be helpful if you can guide further on it!

thanks in advance.

Our team is moving away from Rasa, but before that we did spend time on packaging it into lambda function. The best way is to package it into a Docker container (allows max 10GB image) and deploy.

There is an AWS package that helps convert a web server into a lambda function: GitHub - awslabs/aws-lambda-web-adapter: Run web applications on AWS Lambda

There are a few things to be aware of, mostly around the cache folder configurations of Rasa. We will need to make sure Rasa writes files into /tmp folder.

Hope this helps with anyone who has a need to deploy Rasa onto AWS cost efficiently.

Hi Simon, Can you please share your reason for moving away from Rasa, if possible? I have recently got interested in Rasa and have been trying to learn. Thanks, Piyush

Hey @mindseye the main reason we are moving off Rasa is that we outgrew the framework. As we were using Rasa for our clients, we didn’t think Rasa’s story approach is flexible enough and for the NLU part, it is not difficult anymore to finetune a transformer-based model to achieve the same result if not better.

Rasa is still very good if the chatbot you are building is relatively small and serves a straightforward use case. The NLU is still easy enough to use and deploy. We still prototype with Rasa NLU and ship it to clients with a smaller project scope. We combine Rasa NLU with Voiceflow so we can quickly show and test with the clients with real data and improve over time.

The chatbot landscape is recently changed very dramatically very fast. And even Rasa team is looking into LLMs to help with their paid customers. I would suggest looking into those and see how it can help in terms of your business needs.

Thanks Simon, for taking time and providing detailed information. Can you share information if use Rasa Platform and Rasa Enterprise extensively or you use Rasa Open Source for most of clients? I agree depending on use case, one has to keep options open. It will be interesting to see what LLMs deliver as currently it is all hype around it. In the end all matters is total cost of ownership versus business value. Conversation AI will be key investment for next couple of years by many companies for internal purposes too and not just external as they deal with economy, remote work and quite quitting.