Diet total loss goes up if I include CountVectorizers in the pipeline

Problem: If I include the countvectorizer both word and char_wb the loss shoots up from 0.7 to 5.5. The accuracy is same. I am not sure why this is happening I tried changing the param the dense_dimension but. It did not help.

If I remove the sparse featurizers completely and only use dense features the loss stays below 1.

I have 35 intents. eg affirm, deny.

Is this happening because of data?

I might have introduced slight ambiguity while creating the intents.

eg. In intent deny I have a few examples like yes I am not interested, yes no na dont this.

Or can this be sloved using hyperparameter settings.

This is my config.

language: en

- name: WhitespaceTokenizer
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer
  analyzer: char_wb
  min_ngram: 1
  max_ngram: 4
 #- name: gloveFer.GLoVeFeaturizer
 #  path: glove_100d.kv
    ##### this is a custom component  to use the paraphrase model weights.
   - name: customlanguageFR.CustomLanguageModelFeaturizer
     model_name: sentence-transformers/paraphrase-MiniLM-L6-v2
     base_model: bert
   - name: RegexEntityExtractor
   - name: DIETClassifier
    epochs: 100
    random_seed: 307
    #    embedding_dimension: 120
    constrain_similarities: True
    #   connection_density: 0.7
    #   scale_loss: true
    #   dense_dimension:
    #     text: 256
    #   hidden_layers_sizes:
    #     text: [256] 
    - name: EntitySynonymMapper
    - name: ResponseSelector
      epochs: 100
      retrieval_intent: faq
      scale_loss: False
     - name: ResponseSelector
     epochs: 100
     retrieval_intent: chitchat
      scale_loss: False
     - name: ResponseSelector
     epochs: 100
     retrieval_intent: inform
     scale_loss: False
     -  name: FallbackClassifier
     threshold: 0.67
     ambiguity_threshold: 0.1

  - name: RulePolicy
  - name: MemoizationPolicy
    max_history: 3
  - name: TEDPolicy
    max_history: 4
    epochs: 300
    embedding_dimension: 120
    number_of_transformer_layers: 4 
    connection_density: 0.6
    drop_rate: 0.25
     constrain_similarities: True

Has anyone else experienced this ?

Any mistakes or tips on my config will much appreciated.


Is the ambiguity in your dataset reflective of what your end users might say? If so, I could argue that it’s good to keep.

One thing just to check. When you say;

If I include the countvectorizer both word and char_wb the loss shoots up from 0.7 to 5.5.

Are you talking about validation loss or the training loss here?

You also seem to have a lot of custom components here. There’s nothing wrong with that but over at the rasa nlu examples repo we already host gensim featurizers and Rasa also natively supports huggingface via the LanguageModelFeaturizer. The components also make me curious, are you sure you need them? I’m asking because of this phenomenon.

1 Like

I was taking about the training loss. I am not using the goveFer just forgot to uncomment it . and the customlanguageFr is same as LanguageModelFeaturizer but just helps me use pytorch.model weights. Thanks for the reply.

I just wanted to know why the loss was so high. I mean if I remove the countvectorizers the loss decreases to less then 1


I cannot come up with a good reason. Theoretically, you’re adding more information to the system so the loss should indeed decrease. The only thing I can come up with is that DIET is stuck in local optima. Did you try with a different seed value?

1 Like

I. Had actually set it to a constant ok I will try that

Ya I forgot to mention this but the accuracy is 99. For both intent and entity so maybe loss doesn’t matter in this case ?

Is it possible that I can message u some were privately plz. ? @koaning

You can reach me here on the forum.

Just so I understand, why are you concerned with the training loss? A validation error is typically more interpretable.