Diet total loss goes up if I include CountVectorizers in the pipeline

evilc3 · August 1, 2021, 9:29am

Problem: If I include the countvectorizer both word and char_wb the loss shoots up from 0.7 to 5.5. The accuracy is same. I am not sure why this is happening I tried changing the param the dense_dimension but. It did not help.

If I remove the sparse featurizers completely and only use dense features the loss stays below 1.

I have 35 intents. eg affirm, deny.

Is this happening because of data?

I might have introduced slight ambiguity while creating the intents.

eg. In intent deny I have a few examples like yes I am not interested, yes no na dont this.

Or can this be sloved using hyperparameter settings.

This is my config.

language: en

pipeline:
- name: WhitespaceTokenizer
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer
  analyzer: char_wb
  min_ngram: 1
  max_ngram: 4
 #- name: gloveFer.GLoVeFeaturizer
 #  path: glove_100d.kv
    ##### this is a custom component  to use the paraphrase model weights.
   - name: customlanguageFR.CustomLanguageModelFeaturizer
     model_name: sentence-transformers/paraphrase-MiniLM-L6-v2
     base_model: bert
   - name: RegexEntityExtractor
   - name: DIETClassifier
    epochs: 100
    random_seed: 307
    #    embedding_dimension: 120
    constrain_similarities: True
    #   connection_density: 0.7
    #   scale_loss: true
    #   dense_dimension:
    #     text: 256
    #   hidden_layers_sizes:
    #     text: [256] 
    - name: EntitySynonymMapper
    - name: ResponseSelector
      epochs: 100
      retrieval_intent: faq
      scale_loss: False
     - name: ResponseSelector
     epochs: 100
     retrieval_intent: chitchat
      scale_loss: False
     - name: ResponseSelector
     epochs: 100
     retrieval_intent: inform
     scale_loss: False
     -  name: FallbackClassifier
     threshold: 0.67
     ambiguity_threshold: 0.1

policies:  
  - name: RulePolicy
  - name: MemoizationPolicy
    max_history: 3
  - name: TEDPolicy
    max_history: 4
    epochs: 300
    embedding_dimension: 120
    number_of_transformer_layers: 4 
    connection_density: 0.6
    drop_rate: 0.25
     constrain_similarities: True

Has anyone else experienced this ?

Any mistakes or tips on my config will much appreciated.

@koaning

koaning · August 2, 2021, 7:23am

Is the ambiguity in your dataset reflective of what your end users might say? If so, I could argue that it’s good to keep.

One thing just to check. When you say;

If I include the countvectorizer both word and char_wb the loss shoots up from 0.7 to 5.5.

Are you talking about validation loss or the training loss here?

You also seem to have a lot of custom components here. There’s nothing wrong with that but over at the rasa nlu examples repo we already host gensim featurizers and Rasa also natively supports huggingface via the LanguageModelFeaturizer. The components also make me curious, are you sure you need them? I’m asking because of this phenomenon.

evilc3 · August 3, 2021, 2:56pm

I was taking about the training loss. I am not using the goveFer just forgot to uncomment it . and the customlanguageFr is same as LanguageModelFeaturizer but just helps me use pytorch.model weights. Thanks for the reply.

I just wanted to know why the loss was so high. I mean if I remove the countvectorizers the loss decreases to less then 1

koaning · August 4, 2021, 7:09am

Clear.

I cannot come up with a good reason. Theoretically, you’re adding more information to the system so the loss should indeed decrease. The only thing I can come up with is that DIET is stuck in local optima. Did you try with a different seed value?

evilc3 · August 4, 2021, 1:53pm

I. Had actually set it to a constant ok I will try that

Ya I forgot to mention this but the accuracy is 99. For both intent and entity so maybe loss doesn’t matter in this case ?

evilc3 · August 4, 2021, 2:13pm

Is it possible that I can message u some were privately plz. ? @koaning

koaning · August 4, 2021, 8:39pm

You can reach me here on the forum.

Just so I understand, why are you concerned with the training loss? A validation error is typically more interpretable.

Topic		Replies	Views
Two featurizer in rasa nlu config file [Deprecated] Rasa X Community Edition	8	1480	October 12, 2020
Rasa CountVectorsFeaturizer Rasa Open Source	0	259	August 23, 2021
CountVectorizer warnings Rasa Open Source	5	1220	April 3, 2020
Very high t_loss, but also with high m_acc and i_acc Rasa Open Source	14	2899	February 3, 2022
ValueError: Sequence dimensions for sparse and dense features don't coincide Rasa Open Source	23	2082	February 11, 2020

Diet total loss goes up if I include CountVectorizers in the pipeline

Related topics