1.3 Training Crash during EmbeddingIntentClassifier

stephens · October 14, 2019, 3:45pm

Model training is crashing on me under 1.3.9 during EmbeddingIntentClassifier phase. Here are the last couple of lines:

2019-10-14 15:40:44 INFO     rasa.nlu.model  - Starting to train component CountVectorsFeaturizer
2019-10-14 15:40:45 INFO     rasa.nlu.model  - Finished training component.
2019-10-14 15:40:45 INFO     rasa.nlu.model  - Starting to train component EmbeddingIntentClassifier
Epochs:   0%|          | 1/300 [00:14<1:12:17, 14.51s/it, loss=62.778, acc=0.237]kyc-rasax >

If I run the training under 1.2.11, there is no failure.

I changed the config from:

pipeline: supervised_embeddings

to

pipeline:
 - name: "WhitespaceTokenizer"
 - name: "RegexFeaturizer"
 - name: "CRFEntityExtractor"
 - name: "EntitySynonymMapper"
 - name: "CountVectorsFeaturizer"
 - name: "EmbeddingIntentClassifier"
   num_neg: 300

But the train still crashes.

btotharye · October 15, 2019, 9:33am

I’ll try to reproduce this, I’m assuming nothing too crazy in the training set either right?

btotharye · October 15, 2019, 9:50am

So this basic config from the Rasa Demo bot seems to be ok, going to adjust it now to your settings and see what I get.

language: en
pipeline:
- name: WhitespaceTokenizer
- name: CRFEntityExtractor
- name: CountVectorsFeaturizer
  OOV_token: oov
  token_pattern: (?u)\b\w+\b
- name: EmbeddingIntentClassifier
  epochs: 50
- name: DucklingHTTPExtractor
  url: http://localhost:8000
  dimensions:
  - email
  - number
  - amount-of-money
- name: EntitySynonymMapper

policies:
- epochs: 50
  max_history: 6
  name: KerasPolicy
- max_history: 6
  name: AugmentedMemoizationPolicy
- core_threshold: 0.3
  name: TwoStageFallbackPolicy
  nlu_threshold: 0.8
- name: FormPolicy
- name: MappingPolicy

stephens · October 15, 2019, 6:18pm

Thanks, Brian. I tried your pipeline and get the same failure with the EmbeddingIntentClassifier. I have no problems using the 1.3.9 version with a smaller bot I have (19 unique intents) but with this larger (279 intents) bot it is failing.

I’m stuck on 1.2.11 until I can figure this out. May be time for a github issue.

I’ve opened issue #4616 and also think this could be related to hyperparameter changes between 1.2 & 1.3 as discussed in #4540.

Ghostvv · October 16, 2019, 11:45am

what is the error?

stephens · October 16, 2019, 2:35pm

No actual error. It sits at the 0% message for 20 seconds and stops.

I’ve played around with the hyperparameters (num_neg and batch_strategy: sequence) a little bit and had it get to 2% once but no further.

Epochs:   0%|          | 0/300 [00:00<?, ?it/s]

Here’s a pastebin of the full training session with verbose logging.

stephens · October 16, 2019, 2:45pm

Looking through the log, I just noticed a difference in the 1.2.11 a 1.3.9 output.

Here is my training command:

#export RASA_VERS=1.2.11-full
export RASA_VERS=1.3.9-full
docker run -v $(pwd):/app rasa/rasa:${RASA_VERS} train --config /app/data/config.yml --out /app/models --domain /app/data/domain.yml --data /app/data/training /app/data/stories -vv

The NLU training data load reports the store format as unk in 1.3.9 vs md in 1.2.11.

1.2.11 message related to story format:

2019-10-16 14:37:06 DEBUG    rasa.nlu.training_data.loading  - Training data format of '/app/data/training/xcen.md' is 'md'.
2019-10-16 14:37:06 DEBUG    rasa.nlu.training_data.loading  - Training data format of '/app/data/training/axl.md' is 'md'.
2019-10-16 14:37:06 DEBUG    rasa.nlu.training_data.loading  - Training data format of '/app/data/training/main.md' is 'md'.
2019-10-16 14:37:06 DEBUG    rasa.nlu.training_data.loading  - Training data format of '/app/data/training/glossary.md' is 'md'.
2019-10-16 14:37:06 DEBUG    rasa.nlu.training_data.loading  - Training data format of '/app/data/training/axl_faq.md' is 'md'.
2019-10-16 14:37:06 DEBUG    rasa.nlu.training_data.loading  - Training data format of '/app/data/training/faq.md' is 'md'.

1.3.9 message:

2019-10-16 14:19:31 DEBUG    rasa.nlu.training_data.loading  - Training data format of '/app/data/stories/glossary.md' is 'unk'.
2019-10-16 14:19:31 DEBUG    rasa.nlu.training_data.loading  - Training data format of '/app/data/stories/faq.md' is 'unk'.
2019-10-16 14:19:31 DEBUG    rasa.nlu.training_data.loading  - Training data format of '/app/data/stories/main.md' is 'unk'.
2019-10-16 14:19:31 DEBUG    rasa.nlu.training_data.loading  - Training data format of '/app/data/stories/axl_faq.md' is 'unk'.
2019-10-16 14:19:31 DEBUG    rasa.nlu.training_data.loading  - Training data format of '/app/data/stories/xcen.md' is 'unk'.
2019-10-16 14:19:31 DEBUG    rasa.nlu.training_data.loading  - Training data format of '/app/data/stories/axl.md' is 'unk'.

Ghostvv · October 16, 2019, 3:36pm

By stops, do you mean it runs out of memory?

stephens · October 16, 2019, 6:10pm

I don’t see any error message related to memory. I have 12Gb of memory and my docker engine has a default memory allocation of 2Gb. I increased that to 6Gb and restarted the docker engine and ran the training and it worked. Thanks!

I also do training on a separate CI/CD system that doesn’t have this much memory. Is there a set of parameters that will run the EmbeddingIntentClassifier the same as 1.2.11 does? I’ve tried the batch_strategy: sequence option but it still requires more memory.

I saw your post under issue #4540 about the char level count vectorizer. Is there a hyperparameter to disable or configure this?

Ghostvv · October 17, 2019, 11:12am

yes, please take a look at nlu pipeline docs: Choosing a Pipeline you need to remove char level count vectorizer from the config

Topic		Replies	Views
Upgrading from 1.2.* to 1.3 highly decreased classifier's confidence Rasa Open Source	1	588	November 13, 2019
Error: embedding_intent_classifier Rasa Open Source	9	1488	December 10, 2019
Rasa nlu training problem Rasa Open Source	2	826	December 23, 2019
Embedding intent classifier breaks in rasa v1.7.2 Rasa Open Source	1	728	February 18, 2020
Error in training nlu with BytePairFeaturizer Rasa Open Source	6	1146	July 28, 2020

1.3 Training Crash during EmbeddingIntentClassifier

Related topics