Multi GPU not supported for local models

I am using Ollama as my platform for running local models. I have a 4x16GB GPU cluster that comfortably supports a llama3 model which is around 40GB. When I just run Ollama server and send a request with curl, it loads the model across all 4 GPUs, and everything works. But when with the same config and server, Rasa sends a request, it tries to load the entire model into one GPU for some reason and the server fails and it never receives a response. How do I specify it to use the available multiple GPUs?