Hi there, I’m working on the FAQ model recently, and i have a very large data to train, I have try two solution.(Edit: I’m not yet training the whole dataset, but trying a subset data about 10000, and encountered OOM)
The first solution is making multiple “mapping” from intent to the utter, like this:
in the domain.yml
intents:
- 26a4f255-ecdc-311f-8bee-d1445315b941:
triggers: action_faq
templates:
utter_26a4f255-ecdc-311f-8bee-d1445315b941:
- text: <utter>
actions:
- utter_26a4f255-ecdc-311f-8bee-d1445315b941
in the nlu.md
## intent:26a4f255-ecdc-311f-8bee-d1445315b941
- <intent>
The “action_faq” is my custom action which is used mapping intent to the corresponding utter, eg, 26a4f255-ecdc-311f-8bee-d1445315b941 to utter_26a4f255-ecdc-311f-8bee-d1445315b941.
I found this solution training is very very slow. About 5 hours per epoch, may be because my train data is very large about 140000+ intent & utter pairs. What’s worse is the memory is overflow after two days. So i found the second solution from the forum.
The second solution is using ResponseSelector, so the train data like this:
in the nlu.md
## intent:faq/26a4f255-ecdc-311f-8bee-d1445315b941
- <intent>
in the domain.yml
actions:
- respond_faq
intents:
- faq:
triggers: respond_faq
in the nlg_nlu.md
##
* faq/26a4f255-ecdc-311f-8bee-d1445315b941
- <utter>
The second solution is also memory overflow, the log like this:
MemoryError: Unable to allocate array with shape (156368, 13826) and data type int64
[[{{node PyFunc}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[IteratorGetNext]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[softmax_cross_entropy_loss/num_present/broadcast_weights/assert_broadcastable/AssertGuard/Assert/data_5/_165]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
(1) Resource exhausted: MemoryError: Unable to allocate array with shape (156368, 13826) and data type int64
Traceback (most recent call last):
File "/env/miniconda3/envs/rasa/lib/python3.6/site-packages/tensorflow_core/python/ops/script_ops.py", line 235, in __call__
ret = func(*args)
File "/env/miniconda3/envs/rasa/lib/python3.6/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 594, in generator_py_func
values = next(generator_state.get_iterator(iterator_id))
File "/env/miniconda3/envs/rasa/lib/python3.6/site-packages/rasa/utils/train_utils.py", line 202, in gen_batch
session_data = balance_session_data(session_data, batch_size, shuffle)
File "/env/miniconda3/envs/rasa/lib/python3.6/site-packages/rasa/utils/train_utils.py", line 184, in balance_session_data
X=np.concatenate(new_X),
File "<__array_function__ internals>", line 6, in concatenate
MemoryError: Unable to allocate array with shape (156368, 13826) and data type int64
[[{{node PyFunc}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[IteratorGetNext]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
0 successful operations.
0 derived errors ignored.