Unable to set threshold when using model_confidence: linear_norm instead of softmax

I am using the rasa 2.5.0 on windows 10 and observe that when when using model_confidence: linear_norm the range of confidences obtained for the intent prediction reduces as the number of intent augment. A consequence of this is that it becomes impossible to set a fallback threshold. On the other hand when using softmax the nlu model seems to work just fine.

My actual problem contains 150 intents and I obtain the following histograms using linear_norm and softmax

as I cannot share the data for this model, I created a synthetic analogue which illustrate the issues:

the data to reproduce:

config.yml (823 Bytes)

domain.yml (1.0 KB)

nlu.yml (11.1 KB)

Confusion matrix obtained with linear_norm

Confusion matrix obtained with softmax

Config:

language: en

# Rasa NLU
pipeline:
- name: SpacyNLP
  model: "en_core_web_lg"
  case_sensitive: false
- name: SpacyTokenizer
- name: SpacyFeaturizer
- name: RegexFeaturizer
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
  analyzer: "word"
- name: CountVectorsFeaturizer
  analyzer: "char_wb"
  min_ngram: 1
  max_ngram: 4
- name: DIETClassifier
  loss_type: cross_entropy
  model_confidence: linear_norm # softmax
  constrain_similarities: true
  epochs: 200
  intent_classification: true
  entity_recognition: false
  batch_strategy: balanced
- name: EntitySynonymMapper
# - name: FallbackClassifier
  # threshold: 0.90
  # ambiguity_threshold: 0.1

# Rasa Core
policies:
- name: MemoizationPolicy
- name: TEDPolicy
  max_history: 5
  epochs: 200
- name: RulePolicy

Thanks for any tips on this problem

1 Like

Hi @gdl1 Thanks for sharing your results. Both linear_norm and softmax are two different variants you can use for model confidences. While we found linear_norm to be effective on some assistants, for example rasa-demo, it’s good to know that softmax outperforms linear_norm for some others.

We’ll be shipping out a new loss function very soon which will cosine similarities as model confidences underneath. If you are already curious, you can give the working branch of the linked PR a try. :slight_smile:

1 Like

In your real dataset, how many examples do you have on average per intent? In the synthetic dataset that number is low but is that reflective of the number of examples in your real dataset also? I can imagine linear_norm lagging behind softmax in low data conditions.

Hi @dakshvar22 . Thanks for your reply. Definitely curious to see what you are preparing; I had tried the cosine similarity model that was temporarily available a few revision back. As for my ‘real’ dataset here is snapshot on the sample distributions:

What do you consider a low number of examples / intent?

Thanks

Hi @dakshvar22. Wanted to follow-up on this thread and see if you could offer some comments on this point of number of examples per intents. Thanks