How to use GPTQ models

I’m working on a Rasa CALM model and want to leverage my GPU for faster inference. Ideally, I’d like to use GPTQ models, but I understand the current llama-cpp-python library only supports the GGUF format.

Anyone have suggestions for using GPTQ models with Rasa CALM on the GPU?

You can configure any LLM supported by langchain. Most of the recent discussions I’ve seen about local models are using ollama and langchain. Here’s one that may help.