Multi GPU not supported for local models

MannavaVivek · May 22, 2024, 10:14am

I am using Ollama as my platform for running local models. I have a 4x16GB GPU cluster that comfortably supports a llama3 model which is around 40GB. When I just run Ollama server and send a request with curl, it loads the model across all 4 GPUs, and everything works. But when with the same config and server, Rasa sends a request, it tries to load the entire model into one GPU for some reason and the server fails and it never receives a response. How do I specify it to use the available multiple GPUs?

Topic		Replies	Views
Training RASA model in multiple GPU Rasa Open Source	0	376	October 6, 2021
Run NLU training on multiple GPUs Rasa Open Source	6	1925	November 20, 2022
Training: How to use multiple GPUs for distributed training? Rasa Open Source	0	391	April 6, 2022
How to use Ollama models in Rasa CALM? Rasa CALM	8	1082	September 27, 2024
Running model on parallel on google cloud [Deprecated] Rasa X Community Edition	0	248	August 30, 2021

Multi GPU not supported for local models

Related topics