GPU utilization is always 0

I am trying to train a model in a GPU instance. I can see a process spinned for rasa once the training starts however the GPU utilizatoin remains 0 till the training is completed. Also there is no improvement in the training time as compared to CPU.

Here are some information about the setup. NVIDIA-SMI 470.141.03
Driver Version: 470.141.03
CUDA Version: 11.4 RASA 1.10.23

Is there a way to troubleshoot why my GPU is not utilized ?

@siriusraja I assume you’ve used these command line tools already:

lspci | grep -i nvidia  #  List CUDA-capable GPUs
nvidia-smi  #  Nvidia drivers check

One thing I have done in the past is make sure the libraries are loading and the GPU is detected inside Python using your environment. One way to do this is to open a python prompt on the command line, e.g. type python, and start entering some of the lines from the Rasa source code like, include Tensorflow and use its libraries’ functions to tell you how many GPUs it can see. (Google to see how.)

In specific GPU problems I’ve had in the past, it has been either a version incompatibility between my versions of python, Tensorflow, GPU librariars, etc., or it has been my inability to install (and keep installed) the GPU libraries on my VM / compute instance.

Where is your VM hosted?

1 Like

Hi @tomp

Thank you & appreciate your help. Finally figured out and as you said its all to do with finding the right versions of the different components.

Here are the versions that matched in my case for the sake of others.

  1. Rasa Version : 1.10.23
  2. TensorFlow Version : 2.1.3
  3. GPU : Nvidia Tesla K80
  4. Nvidia Driver Version : 418.226.00
  5. cudatoolkit=10.1
  6. cudnn=7.6.5
1 Like