Local LLM with RASA CALM

Hello Community !! I have a perfectly working RASA Pro CALM (v3.8.7) chatbot which is dependent on OpenAI for tasks such as ‘Enterprise Search Policy’, ‘Intentless Policy’, ‘Command Generator’, ‘Contextual Response Rephraser’ etc. Now that i need to make the chatbot completely independent, I need to use a local LLM. (Preferably via Ollama or any similar setup). I tried doing that but I wasn’t successful. I will share my previous config setup which I used with GPT and the new setup in which I’m trying to use the OpenAI entry point to use the local Ollama model. Kindly help me with the right configuration and clear procedure of integrating local LLM with RASA Pro CALM which can replace OpenAI in every way. Any help in regarding this is highly appreciated.

Old Config setup

recipe: default.v1
language: en
pipeline:
- name: LLMCommandGenerator
  llm:
    model_name: gpt-4-turbo

policies:
  - name: FlowPolicy
  - name: EnterpriseSearchPolicy
  - name: IntentlessPolicy
    nlu_abstention_threshold: 0.6

assistant_id: 20240408-145223-unsorted-callable

The Config setup that i’m trying

  recipe: default.v1
  language: en

  pipeline:
  - name: LLMCommandGenerator
    llm:
      model: 'llama2'
      type: openai
      openai_api_base: http://localhost:11434/v1
    flow_retrieval:
      embeddings:
        type: "spacy"

  policies:
    - name: FlowPolicy
      llm:
        model: 'llama2'
        type: openai
        openai_api_base: http://localhost:11434/v1
      embeddings:
        type: "spacy"

    - name: EnterpriseSearchPolicy
      llm:
        model: 'llama2'
        type: openai
        openai_api_base: http://localhost:11434/v1
      embeddings:
        type: "spacy"

    - name: IntentlessPolicy
      nlu_abstention_threshold: 0.6

Hi @Shashanka,

It seems like you are running ollama locally. Could you please try with the following change:

- name: LLMCommandGenerator
  llm:
    type: ollama # defaults to locally running ollama server
    model: llama2

I recommend you to try that with the newest rasa-pro 3.9.2 version. That’s what worked for me :slight_smile:

You need to be aware that FlowRetrieval and Rephraser component also use openai GPT models by default. You can change that in their configurations.

1 Like

Hey @Balowen ,

Thank you for your response. I will try and get back to you about this. Meanwhile, could you please let me know the embedding provider that you are using ? If possible, please do share the config and endpoint configuration here. Many thanks in advance.

Hi, For this specific example I didn’t configure embeddings, thus I used the default openai. You don’t need to configure the endpoint for ollama – the default localhost will be used. Check out the documentation here to see how to configure embeddings.

I checked your config a little bit closer and there is one issue with it:

  • you shouldn’t configure llm and embeddings for FlowPolicy. It doesn’t use an LLM at all.

The llm configuration for EnterpriseSearchPolicy should be similar as for the CommandGenerator.

Check out vllm as an alternative to Ollama: GitHub - vllm-project/vllm: A high-throughput and memory-efficient inference and serving engine for LLMs

Hello Shashanka,

I am looking for a tutor to help me in my Rasa X & Rasa Calm studies. Work online from home contact me at omar@mathseven.com Thanks

Hello Omar,

Thank you for reaching out. Currently, I am still in the process of learning and exploring Rasa myself, so I wouldn’t be able to do that.

Best of luck with your search and studies!

Thank you

Hello @Balowen i dont have access to open ai and I want a local llm. I am struggling to find a way on how to implement this. My data base is like a db for which i believe flow should work perfectly but I am not able to run since openAI is not allowed and blocked. Thanks in advance for help.

Hey @Balowen,

My bot is using following components. LLMCommandGenerator, FlowPolicy, IntentlessPolicy, EnterpriseSearchPolicy, rasa.core.ContextualResponseRephraser. Could you tell me which of these require LLM configuration and which of these demand Embedding configuration ?

I have been trying ever since. The ollama did not seem to work for me with the configuration that you had mentioned. By the way, can you confirm if RASA expects embeddings in a fixed length or format ? Thanks a lot !!

@Sanjukta.bs If you have access to a powerful machine you could try CALM with a local llm like llama3. The easiest way to run it is with ollama. You need to be aware though that models like llama 8B aren’t as powerful as gpt3.5 or gpt4 and won’t perform well in command generation, which is essential to understand the user and trigger correct flows.

The components in Rasa Pro that require LLM configuration:

  • LLMCommandGenerator
  • LLMIntentClassifier
  • ContextualResponseRephraser
  • EnterpriseSearchPolicy
  • IntentlessPolicy

Components that use embeddings:

  • LLMCommandGenerator (for flow retrieval)
  • IntentlessPolicy
  • EnterpriseSearchPolicy

Refer to this page to check the supported embeddings providers: LLM Providers.

For EnterpriseSearch, if Rasa doesn’t natively support a particular embedding model that you want to use, custom information retrieval comes to the rescue. You can integrate local or fine-tuned embedding models of your choice to generate embeddings for search queries and documents.

Could you share your whole config, how are you running the ollama and any error logs when running rasa with --debug?

2 Likes

Thanks so much for the reply. My config.yml file looks like this-

I am not able to include embeddings. As I am getting errors like-

2024-08-06 01:10:35 ERROR rasa.dialogue_understanding.generator.llm_based_command_generator - [error ] Flow retrieval store isinaccessible. error=ValueError(“Unsupported embeddings type ‘ollama’”) event_key=llm_based_command_generator.train.failed

And with flow retrieval as false, the model is getting trained but I am not able to use flows. It is not replying anything.

I want to use a local llm, so which provider and embedding will be beneficial if you can recommend.

If possible is there any github project available for rasa pro with local llm models? Thanks so much for the help.

Hey guys, same config, same issue

2024-08-21 20:51:39 ERROR rasa.dialogue_understanding.generator.llm_based_command_generator - [error ] Flow retrieval store isinaccessible. error=ValueError(“Unsupported embeddings type ‘ollama’”) event_key=llm_based_command_generator.train.failed

1 Like

This is true. Not sure if there has been a solution to this as well. Do you have any input on this please @Balowen