What is the recommended Setup for Production Deployemnt

I am trying to figure out what is the recommended way to deploy Rasa in production

is it a good idea to deploy rasa_nlu and rasa_core with APIs interface at two different servers, assuming I need duckling also for some entity resolution? or I go with a rasa_nlu and rasa_core together and one duckling server?

what is more scalable and fail-safe architecture? I learned that rasa have moved from Flask to Klein which Application server is recommended for this in production?

Rasa is using Flask as far i can see

Does your Rasa NLU has another purpose apart from the chatbots? - If you let’s say would like to do entity extraction from certain text or do email classification or tweet classification that usually are more NLP tasks then it is better to keep Rasa NLU separate in a server. Duckling is always running separately as a server

Do you have a more diverse ecosystem of multi chatbots ? - If you are deploying more than one chatbots , i would advice containerisation and technologies such as Kubernetes to really manage your deployment and resources efficiently

Do you have different model lifecycle between your Rasa NLU and Rasa Core?

1 Like

Thanks for the response.

No its only used for Chatbot, I am new to Rasa, so i am trying to understand how to deploy it, basically what is the recommended way from the community.

Please elaborate what do you mean by different life cycles of models?? basically there should not be a dependency between NLU and Core since those two are independently deployable.

There are two ways to train the NLU classifier and the Core Classifier

If you train independently meaning with two different lifecycle - for eg one daily and one adhoc - you might need to deploy them in two separate server

This is mostly production.

Based on your case - you can bundle up your rasa chatbot in one server, however if you have custom actions - it should run on another server or a lambda function like a webhook

You can also externalise your template engine for managing bot responses better.

Ideally separate ML Inference logic from your functional code logic.

1 Like

Thanks, it’s very useful information.

I was referring to the this issue do this means Flask will be removed from future versions?

The issue you are referring seems really old. For the moment, I noticed it is a flask server but not sure how’s that relevant to your question

If you have to use any other framework you can do so as well using Rasa’s python API and wrapping your framework around it for Http

I will redact my above statement, it seems Rasa NLU is running on Klein. You are right

I was more referring to Rasa core

I also saw in the post a recommendation for containerisation which is really a good way to manage scaling. We scale with Kubernetes

1 Like