RASA not working with ollama

This is my config file as using provider it is asking for open api key.

recipe: default.v1
language: en
pipeline:
- name: SingleStepLLMCommandGenerator
  llm:
    type: ollama
    model: llma3model
    base_url: http://localhost:11434

  prompt_template: prompt_templates/time_aware_prompt.jinja2
  flow_retrieval:
    active: false

policies:
- name: FlowPolicy
assistant_id: 20240911-121521-recursive-jersey

While it is generating response but it is taking a lot of time to generate that I keep getting this warning

(venv) D:\Sanjukta_rasa>rasa inspect
2024-09-12 13:30:49 INFO     rasa.tracing.config  - No endpoint for tracing type available in endpoints.yml,tracing will not be configured.
2024-09-12 13:31:01 INFO     root  - Connecting to channel 'rasa.core.channels.development_inspector.DevelopmentInspectInput' which was specified by the '--connector' argument. Any other channels will be ignored. To connect to all given channels, omit the '--connector' argument.
2024-09-12 13:31:02 INFO     root  - Starting Rasa server on http://0.0.0.0:5005
2024-09-12 13:31:23 INFO     rasa.core.processor  - Loading model models\20240912-132942-delicious-panel.tar.gz...
2024-09-12 13:31:23 WARNING  rasa.dialogue_understanding.generator.llm_based_command_generator  - [warning  ] Disabling flow retrieval can cause issues when there are a large number of flows to be included in the prompt. For moreinformation see:
https://rasa.com/docs/rasa-pro/concepts/dialogue-understanding#how-the-llmcommandgenerator-works event_key=llm_based_command_generator.flow_retrieval.disabled
2024-09-12 13:31:25 INFO     root  - Rasa server is up and running.
[2024-09-12 13:31:26 +0530] [12816] [INFO] Starting worker [12816]
2024-09-12 13:31:26 INFO     sanic.server  - Starting worker [12816]
D:\Sanjukta_rasa\venv\lib\site-packages\langchain\llms\ollama.py:164: RuntimeWarning: coroutine 'AsyncCallbackManagerForLLMRun.on_llm_new_token' was never awaited
  run_manager.on_llm_new_token(
RuntimeWarning: Enable tracemalloc to get the object allocation traceback

Can anyone explain how to work a way around of this? I have tried various configurations but none of them seems to work properly. Any responses are appreciated. Thanks.

Hi Did you find any solution for your problem? I’m facing the same problem

@Sanjukta.bs did you find a work around of this?

Try enabling it again, as it can help with response times. Also, double-check that your API key.

I get the same warning with Ollama, and the response times are so bad that its quicker to use GPT-4. But what API do you mean here exactly?