Hello,
I’m using RASA on WSL 2, with Ubuntu (Windows Build - 22000.71). For a while, I used it with my CPU, but I decided to switch to my GPU for obvious performance reasons (note that my models are not very big, by the way).
I have a GTX 1080Ti, and I installed every needed CUDA library. However, I face a couple of issues.
First, I have this warning each time I used rasa train
or rasa run
:
2021-07-22 12:08:31.392488: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:968] could not open file to read NUMA node: /sys/bus/pci/devices/0000:02:00.0/numa_node
Your kernel may have been built without NUMA support.
However, I read somewhere that’s a benign warning, which can be safely ignored. Please let me know if it’s wrong.
Second, I didn’t notice any speed gain. Any call to the classic RASA webhook endpoint may takes between 150ms and 1.5/2s. I would understand whether I had big models running, but I just have a dozen of intents with 5 to 15 examples for each intent, and the corresponding responses, with a bunch of additional rules and 2 basic stories. Nothing exceptional here, with a default config. Do you have any idea why it might take so long and why the GPU does not speed anything?
Third, and this is the most annoying / frustrating part, RASA makes use of 9 Gb of VRAM (not RAM). 9 out of 11 Gb are dedicated to the rasa server when it’s running, and I didn’t find anything on how to limit it. Is there any way to reduce the memory consumption? Moreover, why does it use the VRAM if there isn’t any speedup ? The GPU tensorflow device in background is created, but this is not faster than using my CPU.
Note that my computer is quite old, it still makes use of DDR3, the motherboard is deprecated and I have an i7 4960X, which does not support latest instructions, but still, I’m confused on what’s happening here.
I can provide more information if needed, of course.
Thank you in advance for your help.
Best regards,
Armillus