ResponseSelector low confidence on Rasa 2.6.*

Hello everyone!

We had been working on Rasa 2.3.4 and 2.4.* but recently we decided to upgrade to the latest version. We train a bot that was working perfectly on those versions using 2.6.2, but the response selector now gives us very low confidence for every question, the only change that we did was actually a suggestion that showed when we train:

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/rasa/utils/train_utils.py:455: UserWarning: constrain_similarities is set to False. It is recommended to set it to True when using cross-entropy loss. It will be set to True by default, Rasa Open Source 3.0.0 onwards. category=UserWarning, /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/rasa/utils/train_utils.py:428: UserWarning: model_confidence is set to softmax. It is recommended to try using model_confidence=linear_norm to make it easier to tune fallback thresholds. category=UserWarning,

But now, all my FAQs get very low confidence (0.14 - 0.16) and with 2.4.* we get values above 0.75.

This is my config file:

language: en
pipeline:
  - name: WhitespaceTokenizer
  - name: RegexFeaturizer
  - name: LexicalSyntacticFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: char_wb
    min_ngram: 1
    max_ngram: 4
  - name: DIETClassifier
    epochs: 100
    model_confidence: linear_norm
    constrain_similarities: True
  - name: EntitySynonymMapper
  - name: voiq_response_selector.VoiqResponseSelector
    retrieval_intent: faq
    epochs: 150
    nlu_threshold: 0.8
    scale_loss: False
    model_confidence: linear_norm
    constrain_similarities: True
  - name: FallbackClassifier
    threshold: 0.7
policies:
  - name: MemoizationPolicy
    max_history: 5
  - name: TEDPolicy
    max_history: 5
    epochs: 100
    model_confidence: linear_norm
    constrain_similarities: True
  - name: RulePolicy
    core_fallback_threshold: 0.5
    core_fallback_action_name: action_default_fallback
    enable_fallback_prediction: True

This is the usual result for the ResponseSelector:

Epochs: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 150/150 [01:33<00:00, 1.60it/s, t_loss=4.9, r_acc=0.998]

Any idea what’s going on? Or how can I improve the results? For the moment, I’ll keep using 2.4.*

Thanks

Hi @DanielOlarte How many training examples do you have for your response selector. Also, I see that the training loss is still high during training. This indicates that the model is not too confident during training time as well. It would be better to increase the number of epochs to decrease the t_loss further.

Hi @dakshvar22 As of now, we have this:

Number of response examples: 3471 (115 distinct responses)

And we have an average of 27 training examples per response.

We increase to 300 epochs, and we got this result:

Epochs: 100%|##########| 300/300 [20:10<00:00, 4.03s/it, t_loss=5.2, r_acc=0.997]

Let me know if we need to do something else.

Do you have a lot of overlapping examples across responses? Training loss still being high seems like a problem appearing because of a lot of overlap in examples.

Hi!

I’m facing the same problem, did you manage to solve it somehow?