Local LLM with RASA CALM

Shashanka · July 22, 2024, 3:25pm

Hello Community !! I have a perfectly working RASA Pro CALM (v3.8.7) chatbot which is dependent on OpenAI for tasks such as ‘Enterprise Search Policy’, ‘Intentless Policy’, ‘Command Generator’, ‘Contextual Response Rephraser’ etc. Now that i need to make the chatbot completely independent, I need to use a local LLM. (Preferably via Ollama or any similar setup). I tried doing that but I wasn’t successful. I will share my previous config setup which I used with GPT and the new setup in which I’m trying to use the OpenAI entry point to use the local Ollama model. Kindly help me with the right configuration and clear procedure of integrating local LLM with RASA Pro CALM which can replace OpenAI in every way. Any help in regarding this is highly appreciated.

Old Config setup

recipe: default.v1
language: en
pipeline:
- name: LLMCommandGenerator
  llm:
    model_name: gpt-4-turbo

policies:
  - name: FlowPolicy
  - name: EnterpriseSearchPolicy
  - name: IntentlessPolicy
    nlu_abstention_threshold: 0.6

assistant_id: 20240408-145223-unsorted-callable

The Config setup that i’m trying

  recipe: default.v1
  language: en

  pipeline:
  - name: LLMCommandGenerator
    llm:
      model: 'llama2'
      type: openai
      openai_api_base: http://localhost:11434/v1
    flow_retrieval:
      embeddings:
        type: "spacy"

  policies:
    - name: FlowPolicy
      llm:
        model: 'llama2'
        type: openai
        openai_api_base: http://localhost:11434/v1
      embeddings:
        type: "spacy"

    - name: EnterpriseSearchPolicy
      llm:
        model: 'llama2'
        type: openai
        openai_api_base: http://localhost:11434/v1
      embeddings:
        type: "spacy"

    - name: IntentlessPolicy
      nlu_abstention_threshold: 0.6

Balowen · July 23, 2024, 3:53pm

Hi @Shashanka,

It seems like you are running ollama locally. Could you please try with the following change:

- name: LLMCommandGenerator
  llm:
    type: ollama # defaults to locally running ollama server
    model: llama2

I recommend you to try that with the newest rasa-pro 3.9.2 version. That’s what worked for me

You need to be aware that FlowRetrieval and Rephraser component also use openai GPT models by default. You can change that in their configurations.

Shashanka · July 24, 2024, 5:27pm

Hey @Balowen ,

Thank you for your response. I will try and get back to you about this. Meanwhile, could you please let me know the embedding provider that you are using ? If possible, please do share the config and endpoint configuration here. Many thanks in advance.

Balowen · July 26, 2024, 12:51pm

Hi, For this specific example I didn’t configure embeddings, thus I used the default openai. You don’t need to configure the endpoint for ollama – the default localhost will be used. Check out the documentation here to see how to configure embeddings.

I checked your config a little bit closer and there is one issue with it:

you shouldn’t configure llm and embeddings for FlowPolicy. It doesn’t use an LLM at all.

The llm configuration for EnterpriseSearchPolicy should be similar as for the CommandGenerator.

Check out vllm as an alternative to Ollama: GitHub - vllm-project/vllm: A high-throughput and memory-efficient inference and serving engine for LLMs

MathSeven · July 28, 2024, 9:07pm

Hello Shashanka,

I am looking for a tutor to help me in my Rasa X & Rasa Calm studies. Work online from home contact me at omar@mathseven.com Thanks

Shashanka · July 30, 2024, 7:52pm

Hello Omar,

Thank you for reaching out. Currently, I am still in the process of learning and exploring Rasa myself, so I wouldn’t be able to do that.

Best of luck with your search and studies!

MathSeven · July 31, 2024, 5:52am

Thank you

Sanjukta.bs · July 31, 2024, 11:44am

Hello @Balowen i dont have access to open ai and I want a local llm. I am struggling to find a way on how to implement this. My data base is like a db for which i believe flow should work perfectly but I am not able to run since openAI is not allowed and blocked. Thanks in advance for help.

Shashanka · July 31, 2024, 11:57am

Hey @Balowen,

My bot is using following components. LLMCommandGenerator, FlowPolicy, IntentlessPolicy, EnterpriseSearchPolicy, rasa.core.ContextualResponseRephraser. Could you tell me which of these require LLM configuration and which of these demand Embedding configuration ?

I have been trying ever since. The ollama did not seem to work for me with the configuration that you had mentioned. By the way, can you confirm if RASA expects embeddings in a fixed length or format ? Thanks a lot !!

Balowen · August 5, 2024, 3:30pm

@Sanjukta.bs If you have access to a powerful machine you could try CALM with a local llm like llama3. The easiest way to run it is with ollama. You need to be aware though that models like llama 8B aren’t as powerful as gpt3.5 or gpt4 and won’t perform well in command generation, which is essential to understand the user and trigger correct flows.

The components in Rasa Pro that require LLM configuration:

LLMCommandGenerator
LLMIntentClassifier
ContextualResponseRephraser
EnterpriseSearchPolicy
IntentlessPolicy

Components that use embeddings:

LLMCommandGenerator (for flow retrieval)
IntentlessPolicy
EnterpriseSearchPolicy

Refer to this page to check the supported embeddings providers: LLM Providers.

For EnterpriseSearch, if Rasa doesn’t natively support a particular embedding model that you want to use, custom information retrieval comes to the rescue. You can integrate local or fine-tuned embedding models of your choice to generate embeddings for search queries and documents.

Could you share your whole config, how are you running the ollama and any error logs when running rasa with --debug?

Sanjukta.bs · August 5, 2024, 4:59pm

Thanks so much for the reply. My config.yml file looks like this-

I am not able to include embeddings. As I am getting errors like-

2024-08-06 01:10:35 ERROR rasa.dialogue_understanding.generator.llm_based_command_generator - [error ] Flow retrieval store isinaccessible. error=ValueError(“Unsupported embeddings type ‘ollama’”) event_key=llm_based_command_generator.train.failed

And with flow retrieval as false, the model is getting trained but I am not able to use flows. It is not replying anything.

I want to use a local llm, so which provider and embedding will be beneficial if you can recommend.

If possible is there any github project available for rasa pro with local llm models? Thanks so much for the help.

eyal-dialback · August 28, 2024, 8:49am

Hey guys, same config, same issue

2024-08-21 20:51:39 ERROR rasa.dialogue_understanding.generator.llm_based_command_generator - [error ] Flow retrieval store isinaccessible. error=ValueError(“Unsupported embeddings type ‘ollama’”) event_key=llm_based_command_generator.train.failed

muzzammil · September 26, 2024, 11:33am

This is true. Not sure if there has been a solution to this as well. Do you have any input on this please @Balowen

Jyoti_Prakash_Behera · May 12, 2025, 3:41pm

I am using Rasa Pro version: 3.12.6. I am running llama3:8b model in my local using ollama. For me this is working:

   - name: SingleStepLLMCommandGenerator
     llm:
       model: llama3:8b
       provider: ollama
       api_base: http://localhost:11434
     flow_retrieval:
       active: false

However, I am facing issue with EnterpriseSearchPolicy which needs either openai or azure as model provider, and this is not working with local LLM.

m_ashurkina · May 15, 2025, 11:41am

Hi @Jyoti_Prakash_Behera thank you for your response. Please check out this tutorial on how you can customise Enterprise Search to work with any model: Handling FAQs with Rasa and Faiss: How to implement RAG | The Rasa Blog

Here is an example:

- name: EnterpriseSearchPolicy
  vector_store:
    type: faiss
    source: docs
  llm:
    model_group: gemini_flash_model
  embedding:
    model_group: gemini_embeddings
  prompt: prompts/enterprise-search-policy-template.jinja2
  citation_enabled: true
  max_messages_in_query: 3
  max_history: 3

Please let me know if you have any questions.

Topic		Replies	Views
Can I use Rasa Pro developer edition without an OpenAI API key? Rasa Pro CALM	7	269	October 1, 2024
How to access currently used LLM Rasa Pro CALM	1	128	July 22, 2024
Local LLM and Embedding Configuration Issues in Rasa Pro Rasa Pro CALM	9	266	October 1, 2024
Issue with Ollama LLM Integration - Port Binding and Quota Exceeded - RASA CALM Rasa Pro CALM	11	333	March 31, 2025
Rasa 3.10.0 provider for local llm is giving error Rasa Pro CALM	0	87	September 16, 2024

Local LLM with RASA CALM

Related topics