Hey folks, was upgrading my Rasa version from 1.4.6 to 1.10.0 and using the new TEDPolicy
that I have access to, but I’m running into an issue with memory when trying to train a new model. I have 16Gb of memory and haven’t had issues training with the KerasPolicy
previously. I’ve noticed that once the core training kicks in (after tracker processing) memory fills up then the training session crashes.
This is my current configuration:
config.yml
language: "en"
pipeline:
- name: ConveRTTokenizer
- name: ConveRTFeaturizer
- name: RegexFeaturizer
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer
analyzer: "char_wb"
min_ngram: 1
max_ngram: 4
- name: DIETClassifier
epochs: 100
- name: EntitySynonymMapper
- name: ResponseSelector
epochs: 100
- name: DucklingHTTPExtractor
url: http://duckling:8000
dimensions:
- time
- number
- phone-number
locale: en_US
timezone: America/New_York
policies:
- name: TwoStageFallbackPolicy
nlu_threshold: 0.5
ambiguity_threshold: 0.01
core_threshold: 0.01
fallback_core_action_name: action_default_fallback
fallback_nlu_action_name: flag_conversation_for_review
deny_suggestion_intent_name: incorrect_intent
- name: AugmentedMemoizationPolicy
max_history: 10
- name: MappingPolicy
- name: TEDPolicy
epochs: 300
max_history: 5
batch_size: 8
featurizer:
- name: MaxHistoryTrackerFeaturizer
state_featurizer:
- name: LabelTokenizerSingleStateFeaturizer
I’ve messed around and changed from a linear batch_size
to just leaving it as a single value, but something that I ran into when batch_size
was set to [64, 128]
was this message, which I’m guessing is the same issue I’m running into even with a batch_size
of 8, though in that case the process is just being killed. I don’t know exactly the internals for how this stuff works, but I’m guessing it’s because it’s trying to load the entire training data set in memory as a single array?
The command I was using to train was:
rasa train --data data/interactive data/nlu data/stories --augmentation 0
64-128.log
2020-04-29 18:08:19 INFO rasa.model - Data (core-config) for Core model section changed.
2020-04-29 18:08:19 INFO rasa.model - Data (nlu-config) for NLU model section changed.
2020-04-29 18:08:19 INFO rasa.model - Data (nlg) for NLG templates section changed.
Training Core model...
Processed Story Blocks: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 625/625 [00:00<00:00, 1148.89it/s, # trackers=1]
Processed trackers: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 915/915 [00:00<00:00, 945.76it/s, # actions=3530]
Processed actions: 3530it [00:00, 12246.45it/s, # examples=3448]
Processed trackers: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 915/915 [00:00<00:00, 1132.06it/s, # actions=3530]
2020-04-29 18:08:44.184014: E tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: UNKNOWN ERROR (303)
Traceback (most recent call last):
File "/home/kevin/.cache/pypoetry/virtualenvs/venus-jw1ULBKI-py3.7/lib/python3.7/site-packages/tensorflow_core/python/eager/context.py", line 1897, in execution_mode
yield
File "/home/kevin/.cache/pypoetry/virtualenvs/venus-jw1ULBKI-py3.7/lib/python3.7/site-packages/tensorflow_core/python/data/ops/iterator_ops.py", line 659, in _next_internal
output_shapes=self._flat_output_shapes)
File "/home/kevin/.cache/pypoetry/virtualenvs/venus-jw1ULBKI-py3.7/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_dataset_ops.py", line 2479, in iterator_get_next_sync
_ops.raise_from_not_ok_status(e, name)
File "/home/kevin/.cache/pypoetry/virtualenvs/venus-jw1ULBKI-py3.7/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 6606, in raise_from_not_ok_status
six.raise_from(core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.ResourceExhaustedError: MemoryError: Unable to allocate 17.6 GiB for an array with shape (531559, 1, 10, 889) and data type int32
Traceback (most recent call last):
File "/home/kevin/.cache/pypoetry/virtualenvs/venus-jw1ULBKI-py3.7/lib/python3.7/site-packages/tensorflow_core/python/ops/script_ops.py", line 236, in __call__
ret = func(*args)
File "/home/kevin/.cache/pypoetry/virtualenvs/venus-jw1ULBKI-py3.7/lib/python3.7/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 789, in generator_py_func
values = next(generator_state.get_iterator(iterator_id))
File "/home/kevin/.cache/pypoetry/virtualenvs/venus-jw1ULBKI-py3.7/lib/python3.7/site-packages/rasa/utils/tensorflow/model_data.py", line 402, in _gen_batch
data = self._balanced_data(data, batch_size, shuffle)
File "/home/kevin/.cache/pypoetry/virtualenvs/venus-jw1ULBKI-py3.7/lib/python3.7/site-packages/rasa/utils/tensorflow/model_data.py", line 386, in _balanced_data
final_data[k].append(np.concatenate(np.array(v)))
MemoryError: Unable to allocate 17.6 GiB for an array with shape (531559, 1, 10, 889) and data type int32
[[{{node PyFunc}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[Op:IteratorGetNextSync]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/kevin/.cache/pypoetry/virtualenvs/venus-jw1ULBKI-py3.7/bin/rasa", line 8, in <module>
sys.exit(main())
File "/home/kevin/.cache/pypoetry/virtualenvs/venus-jw1ULBKI-py3.7/lib/python3.7/site-packages/rasa/__main__.py", line 91, in main
cmdline_arguments.func(cmdline_arguments)
File "/home/kevin/.cache/pypoetry/virtualenvs/venus-jw1ULBKI-py3.7/lib/python3.7/site-packages/rasa/cli/train.py", line 76, in train
additional_arguments=extract_additional_arguments(args),
File "/home/kevin/.cache/pypoetry/virtualenvs/venus-jw1ULBKI-py3.7/lib/python3.7/site-packages/rasa/train.py", line 50, in train
additional_arguments=additional_arguments,
File "uvloop/loop.pyx", line 1456, in uvloop.loop.Loop.run_until_complete
File "/home/kevin/.cache/pypoetry/virtualenvs/venus-jw1ULBKI-py3.7/lib/python3.7/site-packages/rasa/train.py", line 101, in train_async
additional_arguments,
File "/home/kevin/.cache/pypoetry/virtualenvs/venus-jw1ULBKI-py3.7/lib/python3.7/site-packages/rasa/train.py", line 188, in _train_async_internal
additional_arguments=additional_arguments,
File "/home/kevin/.cache/pypoetry/virtualenvs/venus-jw1ULBKI-py3.7/lib/python3.7/site-packages/rasa/train.py", line 223, in _do_training
additional_arguments=additional_arguments,
File "/home/kevin/.cache/pypoetry/virtualenvs/venus-jw1ULBKI-py3.7/lib/python3.7/site-packages/rasa/train.py", line 361, in _train_core_with_validated_data
additional_arguments=additional_arguments,
File "/home/kevin/.cache/pypoetry/virtualenvs/venus-jw1ULBKI-py3.7/lib/python3.7/site-packages/rasa/core/train.py", line 66, in train
agent.train(training_data, **additional_arguments)
File "/home/kevin/.cache/pypoetry/virtualenvs/venus-jw1ULBKI-py3.7/lib/python3.7/site-packages/rasa/core/agent.py", line 707, in train
self.policy_ensemble.train(training_trackers, self.domain, **kwargs)
File "/home/kevin/.cache/pypoetry/virtualenvs/venus-jw1ULBKI-py3.7/lib/python3.7/site-packages/rasa/core/policies/ensemble.py", line 124, in train
policy.train(training_trackers, domain, **kwargs)
File "/home/kevin/.cache/pypoetry/virtualenvs/venus-jw1ULBKI-py3.7/lib/python3.7/site-packages/rasa/core/policies/ted_policy.py", line 325, in train
batch_strategy=self.config[BATCH_STRATEGY],
File "/home/kevin/.cache/pypoetry/virtualenvs/venus-jw1ULBKI-py3.7/lib/python3.7/site-packages/rasa/utils/tensorflow/models.py", line 126, in fit
) = self._get_tf_train_functions(eager, model_data, batch_strategy)
File "/home/kevin/.cache/pypoetry/virtualenvs/venus-jw1ULBKI-py3.7/lib/python3.7/site-packages/rasa/utils/tensorflow/models.py", line 342, in _get_tf_train_functions
train_dataset_function, self.train_on_batch, eager, "train"
File "/home/kevin/.cache/pypoetry/virtualenvs/venus-jw1ULBKI-py3.7/lib/python3.7/site-packages/rasa/utils/tensorflow/models.py", line 324, in _get_tf_call_model_function
tf_call_model_function(next(iter(init_dataset)))
File "/home/kevin/.cache/pypoetry/virtualenvs/venus-jw1ULBKI-py3.7/lib/python3.7/site-packages/tensorflow_core/python/data/ops/iterator_ops.py", line 630, in __next__
return self.next()
File "/home/kevin/.cache/pypoetry/virtualenvs/venus-jw1ULBKI-py3.7/lib/python3.7/site-packages/tensorflow_core/python/data/ops/iterator_ops.py", line 674, in next
return self._next_internal()
File "/home/kevin/.cache/pypoetry/virtualenvs/venus-jw1ULBKI-py3.7/lib/python3.7/site-packages/tensorflow_core/python/data/ops/iterator_ops.py", line 665, in _next_internal
return structure.from_compatible_tensor_list(self._element_spec, ret)
File "/usr/lib/python3.7/contextlib.py", line 130, in __exit__
self.gen.throw(type, value, traceback)
File "/home/kevin/.cache/pypoetry/virtualenvs/venus-jw1ULBKI-py3.7/lib/python3.7/site-packages/tensorflow_core/python/eager/context.py", line 1900, in execution_mode
executor_new.wait()
File "/home/kevin/.cache/pypoetry/virtualenvs/venus-jw1ULBKI-py3.7/lib/python3.7/site-packages/tensorflow_core/python/eager/executor.py", line 67, in wait
pywrap_tensorflow.TFE_ExecutorWaitForAllPendingNodes(self._handle)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: MemoryError: Unable to allocate 17.6 GiB for an array with shape (531559, 1, 10, 889) and data type int32
Traceback (most recent call last):
File "/home/kevin/.cache/pypoetry/virtualenvs/venus-jw1ULBKI-py3.7/lib/python3.7/site-packages/tensorflow_core/python/ops/script_ops.py", line 236, in __call__
ret = func(*args)
File "/home/kevin/.cache/pypoetry/virtualenvs/venus-jw1ULBKI-py3.7/lib/python3.7/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 789, in generator_py_func
values = next(generator_state.get_iterator(iterator_id))
File "/home/kevin/.cache/pypoetry/virtualenvs/venus-jw1ULBKI-py3.7/lib/python3.7/site-packages/rasa/utils/tensorflow/model_data.py", line 402, in _gen_batch
data = self._balanced_data(data, batch_size, shuffle)
File "/home/kevin/.cache/pypoetry/virtualenvs/venus-jw1ULBKI-py3.7/lib/python3.7/site-packages/rasa/utils/tensorflow/model_data.py", line 386, in _balanced_data
final_data[k].append(np.concatenate(np.array(v)))
MemoryError: Unable to allocate 17.6 GiB for an array with shape (531559, 1, 10, 889) and data type int32
[[{{node PyFunc}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
**
8.log
2020-04-29 18:43:05 INFO rasa.model - Data (core-config) for Core model section changed.
2020-04-29 18:43:05 INFO rasa.model - Data (nlu-config) for NLU model section changed.
2020-04-29 18:43:05 INFO rasa.model - Data (nlg) for NLG templates section changed.
Training Core model...
Processed Story Blocks: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 625/625 [00:00<00:00, 1079.35it/s, # trackers=1]
Processed trackers: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 915/915 [00:00<00:00, 941.63it/s, # actions=3530]
Processed actions: 3530it [00:00, 12003.19it/s, # examples=3448]
Processed trackers: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 915/915 [00:01<00:00, 848.67it/s, # actions=2800]
2020-04-29 18:43:31.413820: E tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: UNKNOWN ERROR (303)
Killed