A robust solution for production
-
Separate LLM from Action Server:
- Create a Flask application specifically for LLM calls.
- Run the Flask application in a separate container and define its own endpoint.
-
Handle User Input for RAG:
- If your LLM setup involves a Retrieval-Augmented Generation (RAG) system that requires user input, define a function within the LLM Flask app to handle this input.
- Create an endpoint in your Flask app for the UI code to provide user input. This ensures the Flask app can receive and process the user input from the UI.
-
Resolve Version Conflicts:
- This separation will address any version conflicts between the RASA environment, which might be built on older packages, and the LLM application, which uses the latest packages.
This approach helps in managing dependencies and avoiding conflicts between different components.