Rasa Assistent trained on CPU faster than on GPU

Hello! The assistant model is trained on the CPU faster than on the GPU. I turning off support of CUDA in OS (remove modules cudatoolkit and cunn). What could be the problem and how to fix it? Please help me.

OS: Linux Manjaro KDE GPU: Geforce 750Ti CPU: AMD FX6300 (six core) Rasa: 2.8 Python: 3.8 Tensorflow 2.6

Data time of train NLU Inizialize Core Total |CPU 1st train| 00:03:14| 00:01:00| 00:02:24| 00:06:38| |CPU 2nd train| 00:03:17| 00:01:00| 00:02:22| 00:06:39| |GPU 1st train| 00:01:30| 00:01:00| 00:17:31| 00:20:01| |GPU 2nd train| 00:01:23| 00:01:00| 00:16:55| 00:19:18|

Hey @faupotron. My first guess is that it’s caused by TensorFlow. By default, TensorFlow blocks all the available GPU memory for running processes. You can try changing this behavior by setting the environmental variable TF_FORCE_GPU_ALLOW_GROWTH to TRUE.

Just a quick question on top of that - how many training examples do you have for your NLU model? If your project is on a smaller side, GPU might not be even necessary for your project.

Thans, @Juste. I have about 350 examples for NLU. It’s all examples for all stories. And I have about 20 intents in stories

How many examples we need for 1 intent or story?

@Juste , where is I can to fix this variable

@Juste , how can I set this variable in config.yml of my assistant?

Great. For an assistant this size I’d doubt you really need a GPU, unless your plan is to scale the application more.

The variable should be set in your environment. Are you training your assistant locally? You should be able to set the environment variables by executing the following in your terminal:

export TF_FORCE_GPU_ALLOW_GROWTH=true

@Juste Ok. When I train my assistant, I see in process manager, that GPU is use and load.

The question: why my model is train on CPU faster than on GPU?

For experiment, I download and install the model “Rasa-demo” from github. It’s trained on GPU faster. I didn’t change the settings.

I think, that the problem with model settings or others…

It is most likely due to the fact that your assistant is smaller compared to rasa-demo. As I have mentioned previously, for smaller applications GPUs are completely unnecessary and in some cases you might see little to no improvement to training times compared to using only CPU. It’s because when you use GPU for small models your system gets a lot of overhead in invoking GPU kernels and copying data.

Ok. Thanks. I’ll observe for this process as my assistant gets bigger

This was addressed with the addition of a use_gpu flag to TEDPolicy.