Local LLM with RASA CALM

Hi, For this specific example I didn’t configure embeddings, thus I used the default openai. You don’t need to configure the endpoint for ollama – the default localhost will be used. Check out the documentation here to see how to configure embeddings.

I checked your config a little bit closer and there is one issue with it:

  • you shouldn’t configure llm and embeddings for FlowPolicy. It doesn’t use an LLM at all.

The llm configuration for EnterpriseSearchPolicy should be similar as for the CommandGenerator.

Check out vllm as an alternative to Ollama: GitHub - vllm-project/vllm: A high-throughput and memory-efficient inference and serving engine for LLMs