Local LLM and Embedding Configuration Issues in Rasa Pro

I’m trying to use a local LLM and embeddings from Ollama with Rasa Pro. When I comment out the OPENAI_API_KEY in the .env file, then train and run the model using the tutorial, I encounter the following error, but the bot still responds:

Additionally, when I train and test with a fake OPENAI_API_KEY set to “@@@”, I also get an error, but the bot still responds:

I’m not sure why these errors are occurring and whether the local embedding is actually being used, or if it’s defaulting to the OpenAI embedding as mentioned in the documentation. Here’s the config I’m using:

Despite the recommendation in this post not to configure both LLM and embedding for FlowPolicy, should I reconfigure as follows?

Is there any solution to this?

Hello since you are also using ollama to run models locally, are you getting errors like asynchronous calls while using ollama? I am facing this particular issue and my responses are also very late about 1minute. Is there anything that we can do to fix it?

Hey Sanjukta. Whether you were successful connecting the local LLM at least ? I was not able to do that. Can you share your configurations files please and the method through which you are launching the LLM server.

It is giving me errors while running, but atleast training is successful with this config.

recipe: default.v1
language: en
pipeline:
- name: SingleStepLLMCommandGenerator
  llm:
    type: ollama
    model: llma3model
    api_base: http://localhost:11434
  prompt_template: prompt_templates/time_aware_prompt.jinja2
  flow_retrieval:
    active: false

Whether it can predict the actions and flow steps with your current config file ?

With the current config, my bot is running normally. As for the issue mentioned, I still don’t know how to solve it yet. Regarding running the LLM locally, I just downloaded and followed the source code from GitHub - ollama/ollama: Get up and running with Llama 3.1, Mistral, Gemma 2, and other large language models..

Hello, are you using ollama inside docker? I am trying it with docker my config is

recipe: default.v1
language: en
pipeline:
- name: SingleStepLLMCommandGenerator
  llm:
    type: ollama
    model: llma3model
    base_url: http://localhost:11434

  prompt_template: prompt_templates/time_aware_prompt.jinja2
  flow_retrieval:
    active: false




policies:
- name: FlowPolicy
assistant_id: 20240911-121521-recursive-jersey

but I keep getting errors like

(venv) D:\Sanjukta_rasa>rasa inspect
2024-09-12 13:30:49 INFO     rasa.tracing.config  - No endpoint for tracing type available in endpoints.yml,tracing will not be configured.
2024-09-12 13:31:01 INFO     root  - Connecting to channel 'rasa.core.channels.development_inspector.DevelopmentInspectInput' which was specified by the '--connector' argument. Any other channels will be ignored. To connect to all given channels, omit the '--connector' argument.
2024-09-12 13:31:02 INFO     root  - Starting Rasa server on http://0.0.0.0:5005
2024-09-12 13:31:23 INFO     rasa.core.processor  - Loading model models\20240912-132942-delicious-panel.tar.gz...
2024-09-12 13:31:23 WARNING  rasa.dialogue_understanding.generator.llm_based_command_generator  - [warning  ] Disabling flow retrieval can cause issues when there are a large number of flows to be included in the prompt. For moreinformation see:
https://rasa.com/docs/rasa-pro/concepts/dialogue-understanding#how-the-llmcommandgenerator-works event_key=llm_based_command_generator.flow_retrieval.disabled
2024-09-12 13:31:25 INFO     root  - Rasa server is up and running.
[2024-09-12 13:31:26 +0530] [12816] [INFO] Starting worker [12816]
2024-09-12 13:31:26 INFO     sanic.server  - Starting worker [12816]
D:\Sanjukta_rasa\venv\lib\site-packages\langchain\llms\ollama.py:164: RuntimeWarning: coroutine 'AsyncCallbackManagerForLLMRun.on_llm_new_token' was never awaited
  run_manager.on_llm_new_token(
RuntimeWarning: Enable tracemalloc to get the object allocation traceback

Does that happened to you, or is there something we can do to solve that?

I’m not using docker. The errors i had received were because of the embedding model and flow retrieval related.

He understands your frustration with Ollama. He’s experienced similar issues, like asynchronous call errors and delayed responses. It’s a known problem when running models locally. While there’s no perfect fix, you can try updating Ollama or adjusting your system’s resource allocation to see if it helps. Sometimes, reducing the model size or running fewer simultaneous processes can also improve performance. If the problem persists, reaching out to Ollama’s support or checking their forums might provide more tailored solutions.