I am trying to figure out what is the recommended way to deploy Rasa in production
is it a good idea to deploy rasa_nlu and rasa_core with APIs interface at two different servers, assuming I need duckling also for some entity resolution?
or I go with a rasa_nlu and rasa_core together and one duckling server?
what is more scalable and fail-safe architecture? I learned that rasa have moved from Flask to Klein which Application server is recommended for this in production?
Does your Rasa NLU has another purpose apart from the chatbots? - If you let’s say would like to do entity extraction from certain text or do email classification or tweet classification that usually are more NLP tasks then it is better to keep Rasa NLU separate in a server. Duckling is always running separately as a server
Do you have a more diverse ecosystem of multi chatbots ? - If you are deploying more than one chatbots , i would advice containerisation and technologies such as Kubernetes to really manage your deployment and resources efficiently
Do you have different model lifecycle between your Rasa NLU and Rasa Core?
No its only used for Chatbot, I am new to Rasa, so i am trying to understand how to deploy it, basically what is the recommended way from the community.
Please elaborate what do you mean by different life cycles of models?? basically there should not be a dependency between NLU and Core since those two are independently deployable.
There are two ways to train the
NLU classifier and the Core Classifier
If you train independently meaning with two different lifecycle - for eg one daily and one adhoc - you might need to deploy them in two separate server
This is mostly production.
Based on your case - you can bundle up your rasa chatbot in one server, however if you have custom actions - it should run on another server or a lambda function like a webhook
You can also externalise your template engine for managing bot responses better.
Ideally separate ML Inference logic from your functional code logic.