Local LLM and Embedding Configuration Issues in Rasa Pro

dauvannam1804 · August 22, 2024, 6:24pm

I’m trying to use a local LLM and embeddings from Ollama with Rasa Pro. When I comment out the OPENAI_API_KEY in the .env file, then train and run the model using the tutorial, I encounter the following error, but the bot still responds:

Additionally, when I train and test with a fake OPENAI_API_KEY set to “@@@”, I also get an error, but the bot still responds:

I’m not sure why these errors are occurring and whether the local embedding is actually being used, or if it’s defaulting to the OpenAI embedding as mentioned in the documentation. Here’s the config I’m using:

Despite the recommendation in this post not to configure both LLM and embedding for FlowPolicy, should I reconfigure as follows?

Sanjukta.bs · September 11, 2024, 10:33am

Is there any solution to this?

Sanjukta.bs · September 11, 2024, 8:11pm

Hello since you are also using ollama to run models locally, are you getting errors like asynchronous calls while using ollama? I am facing this particular issue and my responses are also very late about 1minute. Is there anything that we can do to fix it?

Shashanka · September 12, 2024, 6:06am

Hey Sanjukta. Whether you were successful connecting the local LLM at least ? I was not able to do that. Can you share your configurations files please and the method through which you are launching the LLM server.

Sanjukta.bs · September 12, 2024, 6:41am

It is giving me errors while running, but atleast training is successful with this config.

recipe: default.v1
language: en
pipeline:
- name: SingleStepLLMCommandGenerator
  llm:
    type: ollama
    model: llma3model
    api_base: http://localhost:11434
  prompt_template: prompt_templates/time_aware_prompt.jinja2
  flow_retrieval:
    active: false

Shashanka · September 12, 2024, 6:59am

Whether it can predict the actions and flow steps with your current config file ?

dauvannam1804 · September 12, 2024, 7:44am

With the current config, my bot is running normally. As for the issue mentioned, I still don’t know how to solve it yet. Regarding running the LLM locally, I just downloaded and followed the source code from GitHub - ollama/ollama: Get up and running with Llama 3.1, Mistral, Gemma 2, and other large language models..

Sanjukta.bs · September 12, 2024, 8:59am

Hello, are you using ollama inside docker? I am trying it with docker my config is

recipe: default.v1
language: en
pipeline:
- name: SingleStepLLMCommandGenerator
  llm:
    type: ollama
    model: llma3model
    base_url: http://localhost:11434

  prompt_template: prompt_templates/time_aware_prompt.jinja2
  flow_retrieval:
    active: false




policies:
- name: FlowPolicy
assistant_id: 20240911-121521-recursive-jersey

but I keep getting errors like

(venv) D:\Sanjukta_rasa>rasa inspect
2024-09-12 13:30:49 INFO     rasa.tracing.config  - No endpoint for tracing type available in endpoints.yml,tracing will not be configured.
2024-09-12 13:31:01 INFO     root  - Connecting to channel 'rasa.core.channels.development_inspector.DevelopmentInspectInput' which was specified by the '--connector' argument. Any other channels will be ignored. To connect to all given channels, omit the '--connector' argument.
2024-09-12 13:31:02 INFO     root  - Starting Rasa server on http://0.0.0.0:5005
2024-09-12 13:31:23 INFO     rasa.core.processor  - Loading model models\20240912-132942-delicious-panel.tar.gz...
2024-09-12 13:31:23 WARNING  rasa.dialogue_understanding.generator.llm_based_command_generator  - [warning  ] Disabling flow retrieval can cause issues when there are a large number of flows to be included in the prompt. For moreinformation see:
https://rasa.com/docs/rasa-pro/concepts/dialogue-understanding#how-the-llmcommandgenerator-works event_key=llm_based_command_generator.flow_retrieval.disabled
2024-09-12 13:31:25 INFO     root  - Rasa server is up and running.
[2024-09-12 13:31:26 +0530] [12816] [INFO] Starting worker [12816]
2024-09-12 13:31:26 INFO     sanic.server  - Starting worker [12816]
D:\Sanjukta_rasa\venv\lib\site-packages\langchain\llms\ollama.py:164: RuntimeWarning: coroutine 'AsyncCallbackManagerForLLMRun.on_llm_new_token' was never awaited
  run_manager.on_llm_new_token(
RuntimeWarning: Enable tracemalloc to get the object allocation traceback

Does that happened to you, or is there something we can do to solve that?

Shashanka · September 12, 2024, 9:28am

I’m not using docker. The errors i had received were because of the embedding model and flow retrieval related.

Rhignome · October 1, 2024, 10:41am

He understands your frustration with Ollama. He’s experienced similar issues, like asynchronous call errors and delayed responses. It’s a known problem when running models locally. While there’s no perfect fix, you can try updating Ollama or adjusting your system’s resource allocation to see if it helps. Sometimes, reducing the model size or running fewer simultaneous processes can also improve performance. If the problem persists, reaching out to Ollama’s support or checking their forums might provide more tailored solutions.

Topic		Replies	Views
Local LLM with RASA CALM Rasa CALM	14	966	May 15, 2025
Rasa 3.10.0 provider for local llm is giving error Rasa CALM	0	95	September 16, 2024
How to use Ollama models in Rasa CALM? Rasa CALM	8	1073	September 27, 2024
Issue with Ollama LLM Integration - Port Binding and Quota Exceeded - RASA CALM Rasa CALM	11	366	March 31, 2025
RASA not working with ollama Rasa CALM	4	127	September 27, 2024

Local LLM and Embedding Configuration Issues in Rasa Pro

Related topics