Anyone here could help me out with an error that I’m facing after upgrading rasa to 1.3.3.
The error is most likely for using TF-GPU, because I don’t get any errors when I uninstalled tf-gpu and run the ‘rasa train’ command…
I’ve re-installed TF-GPU 1.14.0, CUDA and cuDNN libraries.
CUDA - 10.0
cuDNN - 7.6.0 for CUDA 10.0
Python version - 3.7.4
Windows 10
2019-09-15 21:26:34 INFO rasa.nlu.model - Starting to train component WhitespaceTokenizer
2019-09-15 21:26:34 INFO rasa.nlu.model - Finished training component.
2019-09-15 21:26:34 INFO rasa.nlu.model - Starting to train component RegexFeaturizer
2019-09-15 21:27:01 INFO rasa.nlu.model - Finished training component.
2019-09-15 21:27:01 INFO rasa.nlu.model - Starting to train component CRFEntityExtractor
2019-09-15 21:27:12 INFO rasa.nlu.model - Finished training component.
2019-09-15 21:27:12 INFO rasa.nlu.model - Starting to train component EntitySynonymMapper
2019-09-15 21:27:12 INFO rasa.nlu.model - Finished training component.
2019-09-15 21:27:12 INFO rasa.nlu.model - Starting to train component CountVectorsFeaturizer
2019-09-15 21:27:12 INFO rasa.nlu.model - Finished training component.
2019-09-15 21:27:12 INFO rasa.nlu.model - Starting to train component CountVectorsFeaturizer
2019-09-15 21:27:13 INFO rasa.nlu.model - Finished training component.
2019-09-15 21:27:13 INFO rasa.nlu.model - Starting to train component EmbeddingIntentClassifier
2019-09-15 21:27:14.710581: F tensorflow/stream_executor/lib/statusor.cc:34] Attempting to fetch value instead of handling error Internal: failed to get device attribute 13 for device 0: CUDA_ERROR_UNKNOWN: unknown error
Tensorflow-GPU works for other ML related operations, it doesn’t throw any exceptions for multiple instances like it is throwing here.
Hello and thanks @akelad for looking into this issue.
I’m certainly sure that the tensorflow-gpu is configured correctly on my system. If I just run rasa train nlu --fixed-model-name <my-model-name> it generates the NLU model correctly without any hiccups using the GPU.
Similarly, if I run training for only the core model, it works too (TypeError: Object of type MaxHistoryTrackerFeaturizer is not JSON serializable is worked with a temporary solution provided here.).
But when I run rasa train --fixed-model-name <my-model-name>, the training starts correctly (core training works fine), nlu training works fine up to the process of training the EmbeddingIntentClassifier.
After EmbeddingIntentClassifier, it throws this error: Attempting to fetch value instead of handling error Internal: failed to get device attribute 13 for device 0: CUDA_ERROR_UNKNOWN: unknown error
I checked regarding this error here and here but no fixes yet. Not sure if any other members are facing this issues as I haven’t seen anyone reporting about this neither in forums nor in github issues.
@Juste and @akelad, hope you could help me with this one…
Yep, that’s correct. Sorry @Ghostvv, I was not able to respond to your comment as I was out for a while. Glad to see a newer version of Rasa (1.3.6) is out.
Yes, that’s correct. Rasa works flawlessly with CPU version of TF 1.14.0
Just a question if you don’t mind answering -
Do you guys use CPU or GPU version of TF for development also similarly, do you use Anaconda or normal Python interpreter for testing out, which is more preferable?
We are building bot for enterprise product.
With testing, we see a maximum 50 concurrent user making the Rasa very slow.
As we are supporting standalone setup, we want Rasa to support more concurrent user so they need to create less Rasa nodes(manual effort).
So we tried tensorflow-gpu 1.14 and ended up with this error.