Hey everyone!
I am training my NLU with the DIET classifier locally on GPU and I started receiving a ResourceExhaustedError
error (see full error below) after adding a few more training samples (total of 3319 samples, 29 intents).
Any idea what I can tweak to avoid this?
It worked fine until today when I added something like 50 samples.
Thanks for your help! Nicolas
2020-12-08 13:27:47 INFO rasa.shared.nlu.training_data.training_data - Number of intent examples: 3319 (29 distinct intents)
...
2020-12-08 13:27:53 INFO rasa.nlu.model - Starting to train component DIETClassifier
...
Epochs: 26%|βββββββββββββββββββββββββββββββββββ | 13/50 [01:05<02:55, 4.76s/it, t_loss=24.248, i_acc=0.895, e_f1=0.770, r_f1=0.000]Traceback (most recent call last):
File "/home/nicolas/anaconda3/envs/chatbot2.0/bin/rasa", line 8, in <module>
sys.exit(main())
File "/home/nicolas/anaconda3/envs/chatbot2.0/lib/python3.7/site-packages/rasa/__main__.py", line 116, in main
cmdline_arguments.func(cmdline_arguments)
File "/home/nicolas/anaconda3/envs/chatbot2.0/lib/python3.7/site-packages/rasa/cli/train.py", line 159, in train_nlu
domain=args.domain,
File "/home/nicolas/anaconda3/envs/chatbot2.0/lib/python3.7/site-packages/rasa/train.py", line 470, in train_nlu
domain=domain,
File "/home/nicolas/anaconda3/envs/chatbot2.0/lib/python3.7/site-packages/rasa/utils/common.py", line 308, in run_in_loop
result = loop.run_until_complete(f)
File "uvloop/loop.pyx", line 1456, in uvloop.loop.Loop.run_until_complete
File "/home/nicolas/anaconda3/envs/chatbot2.0/lib/python3.7/site-packages/rasa/train.py", line 512, in _train_nlu_async
additional_arguments=additional_arguments,
File "/home/nicolas/anaconda3/envs/chatbot2.0/lib/python3.7/site-packages/rasa/train.py", line 547, in _train_nlu_with_validated_data
**additional_arguments,
File "/home/nicolas/anaconda3/envs/chatbot2.0/lib/python3.7/site-packages/rasa/nlu/train.py", line 114, in train
interpreter = trainer.train(training_data, **kwargs)
File "/home/nicolas/anaconda3/envs/chatbot2.0/lib/python3.7/site-packages/rasa/nlu/model.py", line 204, in train
updates = component.train(working_data, self.config, **context)
File "/home/nicolas/anaconda3/envs/chatbot2.0/lib/python3.7/site-packages/rasa/nlu/classifiers/diet_classifier.py", line 777, in train
self.component_config[BATCH_STRATEGY],
File "/home/nicolas/anaconda3/envs/chatbot2.0/lib/python3.7/site-packages/rasa/utils/tensorflow/models.py", line 206, in fit
self.train_summary_writer,
File "/home/nicolas/anaconda3/envs/chatbot2.0/lib/python3.7/site-packages/rasa/utils/tensorflow/models.py", line 381, in _batch_loop
call_model_function(batch_in)
File "/home/nicolas/anaconda3/envs/chatbot2.0/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 780, in __call__
result = self._call(*args, **kwds)
File "/home/nicolas/anaconda3/envs/chatbot2.0/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 807, in _call
return self._stateless_fn(*args, **kwds) # pylint: disable=not-callable
File "/home/nicolas/anaconda3/envs/chatbot2.0/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 2829, in __call__
return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access
File "/home/nicolas/anaconda3/envs/chatbot2.0/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1848, in _filtered_call
cancellation_manager=cancellation_manager)
File "/home/nicolas/anaconda3/envs/chatbot2.0/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1924, in _call_flat
ctx, args, cancellation_manager=cancellation_manager))
File "/home/nicolas/anaconda3/envs/chatbot2.0/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 550, in call
ctx=ctx)
File "/home/nicolas/anaconda3/envs/chatbot2.0/lib/python3.7/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[114,137] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node cond_1/else/_115/cond_1/scan/while/body/_1133/cond_1/scan/while/ReduceLogSumExp/Max}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[Op:__inference_train_on_batch_15705]
Function call stack:
train_on_batch
Cheers, Nicolas