Thanks, Brian. I tried your pipeline and get the same failure with the EmbeddingIntentClassifier. I have no problems using the 1.3.9 version with a smaller bot I have (19 unique intents) but with this larger (279 intents) bot it is failing.
I’m stuck on 1.2.11 until I can figure this out. May be time for a github issue.
I’ve opened issue #4616 and also think this could be related to hyperparameter changes between 1.2 & 1.3 as discussed in #4540.
The NLU training data load reports the store format as unk in 1.3.9 vs md in 1.2.11.
1.2.11 message related to story format:
2019-10-16 14:37:06 DEBUG rasa.nlu.training_data.loading - Training data format of '/app/data/training/xcen.md' is 'md'.
2019-10-16 14:37:06 DEBUG rasa.nlu.training_data.loading - Training data format of '/app/data/training/axl.md' is 'md'.
2019-10-16 14:37:06 DEBUG rasa.nlu.training_data.loading - Training data format of '/app/data/training/main.md' is 'md'.
2019-10-16 14:37:06 DEBUG rasa.nlu.training_data.loading - Training data format of '/app/data/training/glossary.md' is 'md'.
2019-10-16 14:37:06 DEBUG rasa.nlu.training_data.loading - Training data format of '/app/data/training/axl_faq.md' is 'md'.
2019-10-16 14:37:06 DEBUG rasa.nlu.training_data.loading - Training data format of '/app/data/training/faq.md' is 'md'.
1.3.9 message:
2019-10-16 14:19:31 DEBUG rasa.nlu.training_data.loading - Training data format of '/app/data/stories/glossary.md' is 'unk'.
2019-10-16 14:19:31 DEBUG rasa.nlu.training_data.loading - Training data format of '/app/data/stories/faq.md' is 'unk'.
2019-10-16 14:19:31 DEBUG rasa.nlu.training_data.loading - Training data format of '/app/data/stories/main.md' is 'unk'.
2019-10-16 14:19:31 DEBUG rasa.nlu.training_data.loading - Training data format of '/app/data/stories/axl_faq.md' is 'unk'.
2019-10-16 14:19:31 DEBUG rasa.nlu.training_data.loading - Training data format of '/app/data/stories/xcen.md' is 'unk'.
2019-10-16 14:19:31 DEBUG rasa.nlu.training_data.loading - Training data format of '/app/data/stories/axl.md' is 'unk'.
I don’t see any error message related to memory. I have 12Gb of memory and my docker engine has a default memory allocation of 2Gb. I increased that to 6Gb and restarted the docker engine and ran the training and it worked. Thanks!
I also do training on a separate CI/CD system that doesn’t have this much memory. Is there a set of parameters that will run the EmbeddingIntentClassifier the same as 1.2.11 does? I’ve tried the batch_strategy: sequence option but it still requires more memory.
I saw your post under issue #4540 about the char level count vectorizer. Is there a hyperparameter to disable or configure this?