For the new Embedding you need to train stories with those chitchats and corrections. So, where is now the advantage/improvement compared to normal LSTM? Is it that you need way less stories to write such unccooperative stories, because attention layer learns not to pay attention to this part and will generalise to stories not trained?
Did you benchmark the training time ? I’m currently training a model by using this policy on a GTX 1080ti (12go) + 32Go ram + 32 core CPU and each epoch takes about ~10 min …
I’m using :
policies:
- name: EmbeddingPolicy
epochs: 2000
attn_shift_range: 5
EDIT:
I’ll answer my own question : the EmbeddingPolicy should be use with --augmentation 0
yes! the attention mechanisms definitely require more computer power to train. You can also switch off one (or both) of the attentions to swap a bit of generalization power for compute time
@adrianhumphrey111 has already given the answer.
the EmbeddingPolicy should be use with --augmentation 0
–augmentation 0 add this in your command while training
/usr/local/lib/python3.6/site-packages/pykwalify/core.py:99: UnsafeLoaderWarning:
The default ‘Loader’ for ‘load(stream)’ without further arguments can be unsafe.
Use ‘load(stream, Loader=ruamel.yaml.Loader)’ explicitly if that is OK.
Alternatively include the following in your code:
import warnings
warnings.simplefilter('ignore', ruamel.yaml.error.UnsafeLoaderWarning)
In most other cases you should consider using 'safe_load(stream)'
data = yaml.load(stream)
Processed Story Blocks: 100%|██████████████████████████████████████████████████████████████████████████████| 26/26 [00:00<00:00, 2419.24it/s, # trackers=16]
2019-01-07 08:22:41 INFO rasa_core.agent - Model directory models/dialogue/ exists and contains old model files. All files will be overwritten.
2019-01-07 08:22:41 INFO rasa_core.agent - Persisted model to '/app/kiddiecommute 2/models/dialogue'
I do not see it going over any epochs. Inside of my models/diaglogue folder, I only have the files: