How to use Ollama models in Rasa CALM?

Hello, I am trying to use models served through Ollama into Rasa CALM? I could not find it anywhere in the documentation, any leads or tips would be helpful

Hi Vishni,

The easiest way I’ve found to use Ollama (presuming you’re running the Ollama server locally) is to make use of its OpenAI-compatible entry point.

Your config would look similar to:

pipeline:
- name: LLMCommandGenerator
  llm:
    model: "wizardlm2:7b"
    max_tokens: 20
    type: openai
    openai_api_base: http://localhost:11434/v1
    openai_api_key: foobar

And, of course, you can replace the value for model with whichever model you have downloaded to your Ollama installation.

Hope this helps!

Hi Chris, Thanks for the quick response! Sure, this is of great help, will try it out.

2024-05-13 17:28:19 WARNING  langchain.llms.base  - Retrying langchain.llms.openai.completion_with_retry.<locals>._completion_with_retry in 8.0 seconds as it raised Timeout: Request timed out: HTTPConnectionPool(host='localhost', port=11434): Read timed out. (read timeout=7).
2024-05-13 17:28:34 WARNING  langchain.llms.base  - Retrying langchain.llms.openai.completion_with_retry.<locals>._completion_with_retry in 10.0 seconds as it raised Timeout: Request timed out: HTTPConnectionPool(host='localhost', port=11434): Read timed out. (read timeout=7).
2024-05-13 17:28:51 ERROR    rasa.utils.log_utils  - [error    ] llm_command_generator.llm.error error=Timeout(message="Request timed out: HTTPConnectionPool(host='localhost', port=11434): Read timed out. (read timeout=7)", http_status=None, request_id=None)

I tried to use Ollama as instructed, and it threw me this error. I suspect its the same issue discussed here and according to the status of the issue, doesn’t seem to be getting a fix anytime soon. So can anyone suggest alternatives for running local models?

Hi Vivek,

Are you able to access your locally running Ollama server with curl? e.g.

curl http://localhost:11434/api/generate -d '{ "model": "phi3", "prompt": "Why is the sky blue?", "stream": false }'

Yes, Ollama is accessible via curl and the python openAI api interface, but doesn’t seem to work when called via Rasa. The config is

pipeline:
- name: LLMCommandGenerator
  llm:
    model: "llama3:latest"
    type: openai
    openai_api_base: http://localhost:11434/v1
    openai_api_key: foobar