We deploy Rasa on AWS for clients using AWS ECS with Fargate instances. We have also built pieces like AWS Batch training pipeline, S3 Tracker Store and SQS Event Broker so the Rasa instances are fully deployed in managed fashion.
The only advantage we get from deploying on Lambda is cost, but I think a 1core 2g Fargate instance is also manageable in terms of cost if the bot is not that intensive to run, and the service will be much more responsive.
However, I spent some time today reading through the source code of Rasa and the Lambda Runtime API, and I think it is doable, but it will take some hacking.
The main problem is that AWS lambda has its own runtime API and we need to essentially write our own lambda runtime and wrap around Rasa. When Rasa core starts the agent will be loaded and a Sanic server will start. The Sanic instance is likely not that useful in a lambda function, so we will need to bootstrap just the Rasa agent somehow for the lambda runtime. The Rasa agent needs to be warmed up at the time the runtime starts, so we need to expect quite a bit of warm-up time.
AWS has built a lambda runtime client using Python so I will likely hack around both the lambda runtime client and spin up Rasa agent directly.
We are also investigating other serverless offerings, I think if we want to save cost, Google Cloud Run would work since they offer a request time pricing similar to lambda, and it seems relatively easier to set up. I don’t have much experience with GCP so I cannot comment any further.
Our team has been doing open source work around Rasa and we will be doing this lambda experiment later this month and early next month. And hopefully I will get back with some findings then.