Error while using Tensorflow GPU 1.14.0

xames3 · September 15, 2019, 4:15pm

Hello,

Anyone here could help me out with an error that I’m facing after upgrading rasa to 1.3.3.

The error is most likely for using TF-GPU, because I don’t get any errors when I uninstalled tf-gpu and run the ‘rasa train’ command…

I’ve re-installed TF-GPU 1.14.0, CUDA and cuDNN libraries.

CUDA - 10.0 cuDNN - 7.6.0 for CUDA 10.0 Python version - 3.7.4 Windows 10

2019-09-15 21:26:34 INFO     rasa.nlu.model  - Starting to train component WhitespaceTokenizer
2019-09-15 21:26:34 INFO     rasa.nlu.model  - Finished training component.
2019-09-15 21:26:34 INFO     rasa.nlu.model  - Starting to train component RegexFeaturizer
2019-09-15 21:27:01 INFO     rasa.nlu.model  - Finished training component.
2019-09-15 21:27:01 INFO     rasa.nlu.model  - Starting to train component CRFEntityExtractor
2019-09-15 21:27:12 INFO     rasa.nlu.model  - Finished training component.
2019-09-15 21:27:12 INFO     rasa.nlu.model  - Starting to train component EntitySynonymMapper
2019-09-15 21:27:12 INFO     rasa.nlu.model  - Finished training component.
2019-09-15 21:27:12 INFO     rasa.nlu.model  - Starting to train component CountVectorsFeaturizer
2019-09-15 21:27:12 INFO     rasa.nlu.model  - Finished training component.
2019-09-15 21:27:12 INFO     rasa.nlu.model  - Starting to train component CountVectorsFeaturizer
2019-09-15 21:27:13 INFO     rasa.nlu.model  - Finished training component.
2019-09-15 21:27:13 INFO     rasa.nlu.model  - Starting to train component EmbeddingIntentClassifier
2019-09-15 21:27:14.710581: F tensorflow/stream_executor/lib/statusor.cc:34] Attempting to fetch value instead of handling error Internal: failed to get device attribute 13 for device 0: CUDA_ERROR_UNKNOWN: unknown error

Tensorflow-GPU works for other ML related operations, it doesn’t throw any exceptions for multiple instances like it is throwing here.

Please help… Stuck on this for past 3-4 days now.

akelad · September 19, 2019, 12:19am

@xames3 are you sure tesnorflow-gpu is properly configured on your machine?

xames3 · September 19, 2019, 11:49am

Hello and thanks @akelad for looking into this issue.

I’m certainly sure that the tensorflow-gpu is configured correctly on my system. If I just run rasa train nlu --fixed-model-name <my-model-name> it generates the NLU model correctly without any hiccups using the GPU.

Similarly, if I run training for only the core model, it works too (TypeError: Object of type MaxHistoryTrackerFeaturizer is not JSON serializable is worked with a temporary solution provided here.).

But when I run rasa train --fixed-model-name <my-model-name>, the training starts correctly (core training works fine), nlu training works fine up to the process of training the EmbeddingIntentClassifier.

After EmbeddingIntentClassifier, it throws this error:
Attempting to fetch value instead of handling error Internal: failed to get device attribute 13 for device 0: CUDA_ERROR_UNKNOWN: unknown error

I checked regarding this error here and here but no fixes yet. Not sure if any other members are facing this issues as I haven’t seen anyone reporting about this neither in forums nor in github issues.

@Juste and @akelad, hope you could help me with this one…

Ghostvv · September 20, 2019, 8:56am

it looks like it cannot handle two different tf sessions (nlu and core) in one process

xames3 · September 22, 2019, 4:40pm

Yep, that’s correct. Sorry @Ghostvv, I was not able to respond to your comment as I was out for a while. Glad to see a newer version of Rasa (1.3.6) is out.

Hope that fixes this bug.

xames3 · September 22, 2019, 6:15pm

Nah… the bug/issue still persists.

Ghostvv · September 23, 2019, 9:35am

this bug seems to be TensorFlow problem, not rasa

Ghostvv · September 23, 2019, 9:36am

How much training data do you have? How much is cpu speed up?

Ghostvv · September 23, 2019, 1:39pm

we found some related GitHub issues: https://github.com/tensorflow/tensorflow/issues/28582 and Call tf.Session() twice causes fatal error: failed to get device attribute 13 for device 0 · Issue #31795 · tensorflow/tensorflow · GitHub

xames3 · September 23, 2019, 2:23pm

Yes, that’s correct. Rasa works flawlessly with CPU version of TF 1.14.0

Just a question if you don’t mind answering -

Do you guys use CPU or GPU version of TF for development also similarly, do you use Anaconda or normal Python interpreter for testing out, which is more preferable?

xames3 · September 23, 2019, 2:26pm

To be very honest @Ghostvv, it is not that much.

Maximum of 20 odd intents and only 2-3 intents have more than 100 examples rest of them have about roughly 50-60 examples.

CPU can handle that much of load as of now. Also by speed up, did you mean the clock speed? It’s an i7 7th gen processor with 2.8 GHz speeds.

Ghostvv · September 24, 2019, 10:07am

we use cpu, because with the amount of data people usually have, and because our algorithms are not very deep, cpu doesn’t provide any faster training

bidya · February 6, 2020, 4:19am

Any solution to this problem?

We are building bot for enterprise product. With testing, we see a maximum 50 concurrent user making the Rasa very slow. As we are supporting standalone setup, we want Rasa to support more concurrent user so they need to create less Rasa nodes(manual effort).

So we tried tensorflow-gpu 1.14 and ended up with this error.

Ghostvv · February 6, 2020, 10:59am

the error is related to cuda, did you install cuda drivers correctly?

samscudder · February 6, 2020, 1:41pm

To test if your GPU is correctly configured, try this python script:

from __future__ import absolute_import, division, print_function, unicode_literals
import tensorflow as tf

print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))

If it returns 0, somethings wrong. Usually when this happens, the error message will give you a clue.

bidya · February 6, 2020, 2:57pm

Thank you for the input. Even I was thinking so, it should be tensorflow + gpu setup issue. The above code returns me 1 gpu device.

Below code also returns the GPU device details. May be I will try to reinstall(cuda +nvidia drivers) once again to see if it works.

tf.config.experimental.list_physical_devices(‘GPU’) 2020-02-06 20:23:22.831000: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library nvcuda.dll 2020-02-06 20:23:23.653171: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: name: Quadro P2000 major: 6 minor: 1 memoryClockRate(GHz): 1.468 pciBusID: 0000:01:00.0 2020-02-06 20:23:23.688631: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check. 2020-02-06 20:23:23.700465: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0 [PhysicalDevice(name=‘/physical_device:GPU:0’, device_type=‘GPU’)]

Topic		Replies	Views
Rasa train (rasa 1.9.x \| TensorFlow 2) on GPU? Rasa Open Source	6	4060	July 6, 2022
Use intent_classifier_tensorflow_embedding with GPU Rasa Open Source	2	777	March 14, 2019
Is Rasa 2.8.1 compatible with tensorflow-gpu? Rasa Open Source	3	1638	August 10, 2021
Rasa without a GPU Getting Started with Rasa	5	404	January 26, 2020
Tensorflow driver : UNKNOWN ERROR (303) Rasa Open Source	32	20007	August 18, 2020

Error while using Tensorflow GPU 1.14.0

Related topics