Local LLM with text-generation-webui steps

Hello everyone I was trying to deploy a local llm with RASA PRO and finally I found the solution here is the details if anyone needs it:

I have installed text-generation-webui → link

Then:

  1. I started the server with ./start_linux.sh
  2. Loaded the model through “Model” tab
  3. In “Session” tab I selected openai, api, listen and pressed Apply flags

In rasa endpoints.yml:

nlg:
  type: rephrase
  rephrase_all: true
  llm:
    model: 'model_gemma_27b_it'
    model_name : 'model_gemma_27b_it'
    type: "openai"
    openai_api_key: "NULL"
    openai_api_base: http://127.0.0.1:5000/v1
    request_timeout: 800

If you have an error

AttributeError: module ‘openai’ has no attribute ‘error’

you have to install this:

  pip install openai==0.28.1
2 Likes

Hey what changes should I be making to my config for this to work! Also I dont have any access to open api key how can I bypass open api keys. Because it keeps popping up. Any help will be appreciated. Thanks

Try and use a huggingface model (Mixtral would be fine)

I want to use local models, I was trying ollama but it is taking a lot of time to generate a reply. Thus I am stuck.

That’s the only stopping point with HF Try and see if you can use vLLM

Hello, if the model takes too much time to generate I believe the problem is that your system is struggling to load the model. Maybe you should try to improve it by reducing the characters generated or other configuration variables.

More than taking time, it is predicting wrong flows! Is there any good demo that we can follow that uses local llms instead of open ai?