[WSL 2] High memory usage on GPU and high latency

Hello,

I’m using RASA on WSL 2, with Ubuntu (Windows Build - 22000.71). For a while, I used it with my CPU, but I decided to switch to my GPU for obvious performance reasons (note that my models are not very big, by the way).

I have a GTX 1080Ti, and I installed every needed CUDA library. However, I face a couple of issues. First, I have this warning each time I used rasa train or rasa run:

2021-07-22 12:08:31.392488: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:968] could not open file to read NUMA node: /sys/bus/pci/devices/0000:02:00.0/numa_node
Your kernel may have been built without NUMA support.

However, I read somewhere that’s a benign warning, which can be safely ignored. Please let me know if it’s wrong.

Second, I didn’t notice any speed gain. Any call to the classic RASA webhook endpoint may takes between 150ms and 1.5/2s. I would understand whether I had big models running, but I just have a dozen of intents with 5 to 15 examples for each intent, and the corresponding responses, with a bunch of additional rules and 2 basic stories. Nothing exceptional here, with a default config. Do you have any idea why it might take so long and why the GPU does not speed anything?

Third, and this is the most annoying / frustrating part, RASA makes use of 9 Gb of VRAM (not RAM). 9 out of 11 Gb are dedicated to the rasa server when it’s running, and I didn’t find anything on how to limit it. Is there any way to reduce the memory consumption? Moreover, why does it use the VRAM if there isn’t any speedup ? The GPU tensorflow device in background is created, but this is not faster than using my CPU.

Note that my computer is quite old, it still makes use of DDR3, the motherboard is deprecated and I have an i7 4960X, which does not support latest instructions, but still, I’m confused on what’s happening here.

I can provide more information if needed, of course.

Thank you in advance for your help.

Best regards,

Armillus

Hi there!

First off – have you made sure that the tensorflow and CUDA versions are compatible? You can try something like below to verify.

import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))

Secondly, what does your configuration look like? I see you said default config, but can you give more details? Only some components will benefit from a GPU.

Tensorflow will pre-allocate memory. If you want to prevent this, read more here.

I will say that it doesn’t sound like you have enough training data to really benefit from the use of a GPU. What does your core training data look like? (meaning: rules, stories)