What changed in the DIETClassifier implementation or defaults? Significant drop in performance for rare intents

nbeuchat · November 26, 2020, 2:17pm

Hello there!

During my migration to Rasa 2.x, I realized that rare intents are very badly classified now. I do have very imbalanced data (some intents have a dozen samples while others (like the ones that are using the ResponseSelector) have a few hundreds. It didn’t cause major problem for most intents in Rasa 1.9.

Even testing on the training data yields terrible results for intents with low support. For example, I have an intent happy which contains a dozen examples such as you're great!, I love you, etc. with a support of only 12 samples. In Rasa 1.9, we had a training recall of ~83% while in Rasa 2.1, I get a recall of 40%.

I am using the same pipeline as before, using Spacy tokenizers/featurizers and the DIETClassifier. From my understanding of the documentation, the DIETClassifier uses a balanced batching which should handle an imbalanced dataset. I have copy-pasted my NLU pipeline at the end of this message.

The only difference I can see is in the settings for the RegexFeaturizer and the SpacyFeaturizer which do not have the return_sequence: true option anymore. A quick glance at the code showed that at least the SpacyFeaturizer returns both sequence and sentence features. The Spacy versions are also identical (2.1.9)

Any idea of where I could look? What has changed between the two versions?

Thanks a lot for the pointers! Cheers Nicolas

config.yml (Rasa 2.x)

language: "en"
pipeline:
  - name: "DucklingEntityExtractor"
    url: "http://duckling.alpaca.casa"
    dimensions: ["time", "duration", "amount-of-money", "number", "email", "phone-number", "ordinal", "url"]
    timezone: "America/New_York"
  - name: "SpacyNLP"
    case_sensitive: true
  - name: "SpacyTokenizer"
  - name: "SpacyEntityExtractor"
    dimensions: ["PERSON", "MONEY"]
  - name: "RegexFeaturizer"
  - name: "SpacyFeaturizer"
  - name: LexicalSyntacticFeaturizer
  - name: "DIETClassifier"
    epochs: 50
    entity_recognition: true
    use_masked_language_model: false
  - ... # response selectors

config.yml (Rasa 1.9)

language: "en"
pipeline:
  - name: "DucklingHTTPExtractor"
    url: "http://duckling.alpaca.casa"
    dimensions: ["time", "duration", "amount-of-money", "number", "email", "phone-number", "ordinal", "url"]
    timezone: "America/New_York"
  - name: "SpacyNLP"
    case_sensitive: true
  - name: "SpacyTokenizer"
  - name: "SpacyEntityExtractor"
    dimensions: ["PERSON", "MONEY"]
  - name: "RegexFeaturizer"
    return_sequence: True  # <-- option not available anymore
  - name: "SpacyFeaturizer"
    return_sequence: True  # <-- option not available anymore
  - name: LexicalSyntacticFeaturizer
  - name: "DIETClassifier"
    epochs: 50
    entity_recognition: true
    use_masked_language_model: false

nbeuchat · November 26, 2020, 2:30pm

Maybe @Tanja would know? It might actually be between 1.9 and 1.10, I’ll test that

nbeuchat · November 26, 2020, 2:42pm

I just tested with Rasa 1.10 and I have the same issue as with 2.1.

Any recommendations of a configuration to get results as I had back in 1.9?

Thanks! Nicolas

nbeuchat · November 26, 2020, 2:58pm

Oh, nevermind. I just increased the number of epochs and that was sufficient. Still curious to know what changed conceptually!

Tanja · November 27, 2020, 9:12am

@nbeuchat Most likely this change here let to the performance change of yours. We did not saw any performance drop when we tested this change back then. However, our datasets we used for testing were not a very imbalanced. We also always used the same number of epochs. Can you maybe try to set scale_loss = True for the DIETClassifier and check if you get same performance as before?

nbeuchat · November 30, 2020, 4:32pm

Hi Tanja! Thanks a lot for your response and my apologies, I forgot to give feedback after testing your suggestion. Setting scale_loss = True seemed to give the same performance for the same number of epochs as initially tested (I just quickly tested once).

shivam.17 · June 30, 2021, 12:44pm

Is this the only change across the two versions 1.10.x and 2.x?

Topic		Replies	Views
Rasa NLU intent recognition models in Portuguese Rasa Open Source	2	276	July 18, 2023
Rasa with spaCy Rasa Open Source	3	526	March 3, 2022
Intent evaluation Rasa Open Source	3	880	July 14, 2020
Getting Confidence of 0.0 Rasa Open Source	9	593	December 9, 2021
Intent MissClassification on Rasa Upgrade Tutorials, Resources & Videos	1	520	November 30, 2019

What changed in the DIETClassifier implementation or defaults? Significant drop in performance for rare intents

Related topics