I’m trying to use a local LLM and embeddings from Ollama with Rasa Pro. When I comment out the OPENAI_API_KEY in the .env file, then train and run the model using the tutorial, I encounter the following error, but the bot still responds:
I’m not sure why these errors are occurring and whether the local embedding is actually being used, or if it’s defaulting to the OpenAI embedding as mentioned in the documentation. Here’s the config I’m using:
Hello since you are also using ollama to run models locally, are you getting errors like asynchronous calls while using ollama?
I am facing this particular issue and my responses are also very late about 1minute. Is there anything that we can do to fix it?
Hey Sanjukta. Whether you were successful connecting the local LLM at least ? I was not able to do that. Can you share your configurations files please and the method through which you are launching the LLM server.
(venv) D:\Sanjukta_rasa>rasa inspect
2024-09-12 13:30:49 INFO rasa.tracing.config - No endpoint for tracing type available in endpoints.yml,tracing will not be configured.
2024-09-12 13:31:01 INFO root - Connecting to channel 'rasa.core.channels.development_inspector.DevelopmentInspectInput' which was specified by the '--connector' argument. Any other channels will be ignored. To connect to all given channels, omit the '--connector' argument.
2024-09-12 13:31:02 INFO root - Starting Rasa server on http://0.0.0.0:5005
2024-09-12 13:31:23 INFO rasa.core.processor - Loading model models\20240912-132942-delicious-panel.tar.gz...
2024-09-12 13:31:23 WARNING rasa.dialogue_understanding.generator.llm_based_command_generator - [warning ] Disabling flow retrieval can cause issues when there are a large number of flows to be included in the prompt. For moreinformation see:
https://rasa.com/docs/rasa-pro/concepts/dialogue-understanding#how-the-llmcommandgenerator-works event_key=llm_based_command_generator.flow_retrieval.disabled
2024-09-12 13:31:25 INFO root - Rasa server is up and running.
[2024-09-12 13:31:26 +0530] [12816] [INFO] Starting worker [12816]
2024-09-12 13:31:26 INFO sanic.server - Starting worker [12816]
D:\Sanjukta_rasa\venv\lib\site-packages\langchain\llms\ollama.py:164: RuntimeWarning: coroutine 'AsyncCallbackManagerForLLMRun.on_llm_new_token' was never awaited
run_manager.on_llm_new_token(
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
Does that happened to you, or is there something we can do to solve that?
He understands your frustration with Ollama. He’s experienced similar issues, like asynchronous call errors and delayed responses. It’s a known problem when running models locally. While there’s no perfect fix, you can try updating Ollama or adjusting your system’s resource allocation to see if it helps. Sometimes, reducing the model size or running fewer simultaneous processes can also improve performance. If the problem persists, reaching out to Ollama’s support or checking their forums might provide more tailored solutions.