Issue with Ollama LLM Integration - Port Binding and Quota Exceeded - RASA CALM

Hi everyone,

I’m currently integrating Ollama as the LLM provider in RASA CALM. While my configuration seems to be correct, I’m encountering an issue where RASA still makes calls to OpenAI’s API and gives the following error:

"ProviderClientAPIException: RateLimitError: OpenAIException - Error code: 429 - {‘error’: {‘message’: ‘You exceeded your current quota, please check your plan and billing details.’} "

This happens despite specifying Ollama as the LLM provider in my config.yml. Here’s my configuration:

" pipeline:

"

It seems RASA still relies on OpenAI for some requests, and I want to ensure it only uses Ollama locally. Has anyone else experienced this, and do you know how to fully disconnect OpenAI while running a local LLM?

Thanks for your help!

Hey @alaa-sayed-ai-expert-flow try this:

llm:
  model: llama3.1
  type: openai
  openai_api_base: http://localhost:11434/v1
  openai_api_key: foobar

This helped me to manually override the OpenAI base to Ollama base.

That is my config.yml file : I have done what you showed me : Screenshot 2024-09-10 120355

But I Still get that error :

I have used command “rasa train”

pipeline:
- name: NLUCommandAdapter
- name: MultiStepLLMCommandGenerator
  llm:
    model: "llama3"
    request_timeout: 10
    type: openai
    openai_api_base: http://localhost:11434/v1
    openai_api_key: foobar
  flow_retrieval:
    active: false
    embeddings:
      type: "huggingface"
      model_name: "sentence-transformers/all-mpnet-base-v2"
      task: "feature-extraction"

Try this and also make sure you are running the same model with ollama in your local.

1 Like

I am using config file as

recipe: default.v1
language: en
pipeline:
- name: SingleStepLLMCommandGenerator
  llm:
    model: "llma3model"
    request_timeout: 10
    type: openai
    openai_api_base: http://localhost:11434
    openai_api_key: foobar
  prompt_template: prompt_templates/time_aware_prompt.jinja2
  flow_retrieval:
    active: false
    embeddings:
      type: "huggingface"
      model_name: "sentence-transformers/all-mpnet-base-v2"
      task: "feature-extraction"

but i am getting errors like 2024-09-12 12:03:21 WARNING langchain.llms.base - Retrying langchain.llms.openai.acompletion_with_retry.._completion_with_retry in 4.0 seconds as it raised APIError: Invalid response object from API: ‘404 page not found’ (HTTP response code was 404).

My ollama is running in docker and I have hosted in http://localhost:11434.

It could be the response rephraser which uses openai got-3.5 by default. It’s documented here and I would disable it for now in your endpoints.yml.

If that’s not the issue, post the full logfile someplace and I will review.

I am getting that error

ERROR rasa.dialogue_understanding.generator.llm_based_command_generator - [error ] llm_based_command_generator.llm.error error=ProviderClientAPIException(“\nOriginal error: litellm.APIError: APIError: OpenAIException - Error code: 500 - {‘error’: {‘message’: 'llama runner process no longer running: -1 ', ‘type’: ‘api_error’, ‘param’: None, ‘code’: None}})”)

and Let me Share with you The structure of my files first : image

The structure of data files : image

config.yml :

flows.yml:

nlu.yml: image

patterns.yml:

rules.yml:

stories.yml:

domain.yml:

For Running command I am using that :

it needs OPENAI API KEY just for running and can’t run without it : export OPENAI_API_KEY=**************************************************** (I didn’t pay for it just for making RASA CALM project RUN)

export RASA_PRO_LICENSE=eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJqdGkiOiJkYTViOGRmZC02NjRmLTQ5Y2ItYTFhNS02OTk3ZmIzN2JiNDMiLCJpYXQiOjE3MjU1NTgwMTEsIm5iZiI6MTcyNTU1ODAwOCwic2NvcGUiOiJyYXNhOnBybyByYXNhOnBybzpjaGFtcGlvbiIsImV4cCI6MTgyMDE2NjAwOCwiZW1haWwiOiJhbGFhLnNheWVkQGV4cGVydGZsb3cuY29tIiwiY29tcGFueSI6IlJhc2EgQ2hhbXBpb25zIn0.HgAZHBN00LJ1fv6rI5ZdTjXxiesdfdpNtrtDk0-TPq-4tWvnX7xWf8w_TloB1rCB3Gg_pU73vXzioCxtebS6rG_O3w-nXpRmntGsae-jI5y5jK2kqFDhEJ1gn5pX5Yzi0rhyt6AofgAm3SD6uiyvbLCnr0qia0HUQWFcZrF5YYsnqgIUOspVPRMH5S2X3Cu7wgMtZ0Ia3VIP0EqTVCtFYYSQsjr8pzSPJT02claDnJATzgqVq2QIqN1c1S4bMAHfb1h43KcPz22_GQbhSd8MbwuX-jZ1oOYptARxzYzXuve8lqVso-VrxHs1kHuf4Wxr4osYzv33-qSNJMIaJK9lcA

RASA train rasa shell

but I am getting the above error.

@sk1382 @Sanjukta.bs @stephens

Hey @alaa-sayed-ai-expert-flow, I just faced the same issue. But after disabling or commenting out the nlg in endpoints.yml resolved the “Quota exceeded” issue.

Hi, Did-you manage to configure Rasa CALM with Ollama in way it is totally OPENAI independant? Even with Entreprise search, rephrase etc. fonctionnalities activated?

If I try “Hugginface” as embedding model, an API KEY is needed even if documentation mentionned this config as “in-memory” solution:

Environment variables: [‘HUGGINGFACE_API_KEY’] not set. Required for API calls

CALM also provides an option to load lightweight embedding models in-memory without needing them to be exposed over an API.

If I try one of the Ollama embedding model mentionned in Ollama web site, I got the following error:

ProviderClientAPIException:Failed to embed documents

RuntimeError: asyncio.run() cannot be called from a running event loop

Thanks for your help.

My current config.yml

recipe: default.v1
language: en
pipeline:
- name: CompactLLMCommandGenerator
  llm:
    model_group: ollama-gemma3-1b
  flow_retrieval:
    embeddings:
      model_group: text_embedding_model #huggingface_embedding_model
policies:
- name: RulePolicy    # Remplace FlowPolicy si problème
- name: MemoizationPolicy
assistant_id: 20250328-161232-caramelized-continent

and endpoints.yml

model_groups:
  - id: ollama-gemma3-1b 
    models:
      - provider: ollama
        api_base: "http://localhost:11434"
        model: gemma3:1b
  - id: text_embedding_model
    models:
      - provider: ollama
        api_base: "http://localhost:11434"
        model: mxbai-embed-large
  # - id: huggingface_embedding_model
  #   models:
  #     - provider: huggingface
  #       model: BAAI/bge-small-en-v1.5
  #       model_kwargs: # used during instantiation
  #         device: "cpu"
  #       encode_kwargs: # used during inference
  #         normalize_embeddings: true

I I run with

I set Huggingface API Key but now, got another error :

2025-03-31 10:16:16 ERROR rasa.dialogue_understanding.generator.flow_retrieval - [error ] Failed to populate the FAISS store with the provided flows. error=ProviderClientAPIException(‘Failed to embed documents\nOriginal error: litellm.APIConnectionError: Expecting value: line 1 column 1 (char 0)\nTraceback (most recent call last):\n File “C:\Users\jl3.PRT-063\anaconda3\envs\ollama_env\lib\site-packages\litellm\main.py”, line 3548, in embedding\n response = huggingface.embedding(\n File “C:\Users\jl3.PRT-063\anaconda3\envs\ollama_env\lib\site-packages\litellm\llms\huggingface_restapi.py”, line 1151, in embedding\n data = self._transform_input(\n File “C:\Users\jl3.PRT-063\anaconda3\envs\ollama_env\lib\site-packages\litellm\llms\huggingface_restapi.py”, line 956, in _transform_input\n hf_task = get_hf_task_embedding_for_model(\n File “C:\Users\jl3.PRT-063\anaconda3\envs\ollama_env\lib\site-packages\litellm\llms\huggingface_restapi.py”, line 289, in get_hf_task_embedding_for_model\n model_info_dict = model_info.json()\n File “C:\Users\jl3.PRT-063\anaconda3\envs\ollama_env\lib\site-packages\httpx\_models.py”, line 832, in json\n return jsonlib.loads(self.content, **kwargs)\n File “C:\Users\jl3.PRT-063\anaconda3\envs\ollama_env\lib\json\init.py”, line 346, in loads\n return _default_decoder.decode(s)\n File “C:\Users\jl3.PRT-063\anaconda3\envs\ollama_env\lib\json\decoder.py”, line 337, in decode\n obj, end = self.raw_decode(s, idx=_w(s, 0).end())\n File “C:\Users\jl3.PRT-063\anaconda3\envs\ollama_env\lib\json\decoder.py”, line 355, in raw_decode\n raise JSONDecodeError(“Expecting value”, s, err.value) from None\njson.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)\n)’) error_type=ProviderClientAPIException event_key=flow_retrieval.populate_vector_store.not_populated 2025-03-31 10:16:16 ERROR rasa.dialogue_understanding.generator.llm_based_command_generator - [error ] Flow retrieval store is inaccessible. error=ProviderClientAPIException(‘Failed to embed documents\nOriginal error: litellm.APIConnectionError: Expecting value: line 1 column 1 (char 0)\nTraceback (most recent call last):\n File “C:\Users\jl3.PRT-063\anaconda3\envs\ollama_env\lib\site-packages\litellm\main.py”, line 3548, in embedding\n response = huggingface.embedding(\n File “C:\Users\jl3.PRT-063\anaconda3\envs\ollama_env\lib\site-packages\litellm\llms\huggingface_restapi.py”, line 1151, in embedding\n data = self._transform_input(\n File “C:\Users\jl3.PRT-063\anaconda3\envs\ollama_env\lib\site-packages\litellm\llms\huggingface_restapi.py”, line 956, in _transform_input\n hf_task = get_hf_task_embedding_for_model(\n File “C:\Users\jl3.PRT-063\anaconda3\envs\ollama_env\lib\site-packages\litellm\llms\huggingface_restapi.py”, line 289, in get_hf_task_embedding_for_model\n model_info_dict = model_info.json()\n File “C:\Users\jl3.PRT-063\anaconda3\envs\ollama_env\lib\site-packages\httpx\_models.py”, line 832, in json\n return jsonlib.loads(self.content, **kwargs)\n File “C:\Users\jl3.PRT-063\anaconda3\envs\ollama_env\lib\json\init.py”, line 346, in loads\n return _default_decoder.decode(s)\n File “C:\Users\jl3.PRT-063\anaconda3\envs\ollama_env\lib\json\decoder.py”, line 337, in decode\n obj, end = self.raw_decode(s, idx=_w(s, 0).end())\n File “C:\Users\jl3.PRT-063\anaconda3\envs\ollama_env\lib\json\decoder.py”, line 355, in raw_decode\n raise JSONDecodeError(“Expecting value”, s, err.value) from None\njson.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)\n)’) event_key=llm_based_command_generator.train.failed 2025-03-31 10:16:16 ERROR rasa.engine.graph - [error ] graph.node.error_running_component node_name=train_CompactLLMCommandGenerator0 2025-03-31 10:16:17 ERROR rasa.main - [error ] ProviderClientAPIException: Failed to embed documents Original error: litellm.APIConnectionError: Expecting value: line 1 column 1 (char 0) Traceback (most recent call last): File “C:\Users\jl3.PRT-063\anaconda3\envs\ollama_env\lib\site-packages\litellm\main.py”, line 3548, in embedding response = huggingface.embedding( File “C:\Users\jl3.PRT-063\anaconda3\envs\ollama_env\lib\site-packages\litellm\llms\huggingface_restapi.py”, line 1151, in embedding data = self._transform_input( File “C:\Users\jl3.PRT-063\anaconda3\envs\ollama_env\lib\site-packages\litellm\llms\huggingface_restapi.py”, line 956, in transform_input hf_task = get_hf_task_embedding_for_model( File “C:\Users\jl3.PRT-063\anaconda3\envs\ollama_env\lib\site-packages\litellm\llms\huggingface_restapi.py”, line 289, in get_hf_task_embedding_for_model model_info_dict = model_info.json() File “C:\Users\jl3.PRT-063\anaconda3\envs\ollama_env\lib\site-packages\httpx_models.py”, line 832, in json return jsonlib.loads(self.content, **kwargs) File "C:\Users\jl3.PRT-063\anaconda3\envs\ollama_env\lib\json_init.py", line 346, in loads return _default_decoder.decode(s) File “C:\Users\jl3.PRT-063\anaconda3\envs\ollama_env\lib\json\decoder.py”, line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File “C:\Users\jl3.PRT-063\anaconda3\envs\ollama_env\lib\json\decoder.py”, line 355, in raw_decode raise JSONDecodeError(“Expecting value”, s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

event_key=cli.exception.rasa_exception