How to use GPTQ models

Elliot94 · April 9, 2024, 12:11pm

I’m working on a Rasa CALM model and want to leverage my GPU for faster inference. Ideally, I’d like to use GPTQ models, but I understand the current llama-cpp-python library only supports the GGUF format.

Anyone have suggestions for using GPTQ models with Rasa CALM on the GPU?

stephens · April 22, 2024, 4:02pm

You can configure any LLM supported by langchain. Most of the recent discussions I’ve seen about local models are using ollama and langchain. Here’s one that may help.

Topic		Replies	Views
Restrict the use of memory GPU for inference in rasa models Rasa Open Source	0	637	January 23, 2020
Minimal viable LLM for Command Generation? Rasa Pro CALM	4	66	August 21, 2024
How to use Ollama models in Rasa CALM? Rasa Pro CALM	5	485	May 14, 2024
Rasa doesn't use Google Colab t4 GPU Rasa Open Source	1	216	February 21, 2024
Multi GPU not supported for local models Rasa Pro CALM	0	250	May 22, 2024

How to use GPTQ models

Related Topics