Fallback doesn't work with 2 retrieval intents FAQ and Chitchat

Yasmine · March 11, 2021, 10:02am

Hello,

I’m working with rasa 2.3.4.

I have two retrieval intents (faq and chitchat), when I provide a random input, Rasa NLU classifies the input using those retrieval intents when logically it’s a nlu_fallback intent.

This problem occurs even in the previous versions.

I found that with the new version 2.3.4 model_confidence " This should ease up tuning fallback thresholds as confidences for wrong predictions are better distributed across the range [0, 1]"

But in my case, that didn’t work.

config.yml (528 Bytes)

domain.yml (705 Bytes)

chitchat.yml (1.2 KB)

faq.yml (598 Bytes)

rules.yml (282 Bytes)

Also, I tried to vary the model_confidence parameter (softmax, linear_norm, even cosine in <=2.3.3).

When I execute the “rasa shell nlu” command, I found that the confidence of random inputs is too high, like the following example:

The problem is that this ‘ab’ token doesn’t exist in the training data and on the other hand the min/max char ngram = 4.

I tried to test that on another projects with more training data but I get always the same results.

I think that this is a problem, especially when we create Q/A assistants.

I hope you could give me some insights on that. Thanks.

Tobias_Wochinger · March 11, 2021, 1:43pm

It seems weird to me that you have confidence 1.0. That shouldn’t be possible to happen with ML components. Did you use rasa shell or rasa interactive for this?

Yasmine · March 11, 2021, 2:48pm

Hey @Tobias_Wochinger ,

Thanks for your reply.

It’s weird for me too. As I mentioned before, I used “rasa shell nlu” command.

dakshvar22 · March 15, 2021, 12:45pm

Hi @Yasmine, I tried out your assistant locally and there are two factors contributing to this -

Assistant has only 2 intents
Confidence measure normalizing the confidences across intents.

We plan to ship a new option for model_confidence as a solution for (2) which would output absolute similarities as confidences.

However, do you plan to have only 2 intents as part of your assistant?

Yasmine · March 15, 2021, 1:09pm

Hello @dakshvar22,

Thank you for your reply.

Yes, Some Q/A assistants need only 2 retrieval intents to work properly, like chitchat and faq in this case.

Otherwise, I tried some assistants with classical intents instead of two retrieval intents. But, I noticed that the linear_norm option gives a very low confidence (the opposite of softmax). I mean by low, a confidence = 0.016… for an input that exists already in training data.

Could you give me an example in which model_confidence=linear_norm is helpful ?

Thanks in advance!

dakshvar22 · March 15, 2021, 1:13pm

That is actually better because with softmax the model is overly confident about almost everything(right or wrong). If the input that exists already in the training data is being classified with such low confidence it means that it is being highly confused with some other intent and the training data should be investigated for such clash. model_confidence=softmax masks off such problems in many cases.

Yasmine · March 15, 2021, 4:00pm

Hey @dakshvar22 ,

Thank you for your reply.

In fact, I tried that with many examples from different intents and the confidence is still low. The problem here, is that I couldn’t fix a good fallback threshold to avoid wrong inputs. Also, I’ve noticed that this behavior happens when the project is large, because with small projects I’m getting the same problem as softmax (high confidence for random inputs)

I converted the previous files using normal intents :

chitchat.yml (1.1 KB)

faq.yml (582 Bytes)

rules.yml (889 Bytes)

config.yml (397 Bytes)

domain.yml (730 Bytes)

So here, I was expecting to get better confidence because I have more than 2 intents (9 intents). But with the same input I tested before, I got :

ab-test

It decreases but still high.

So, even with 9 intents the confidence is high ( even the input doesn’t exist in training data and min/max char ngrams = 4).

Thanks!

dakshvar22 · March 15, 2021, 5:06pm

Do you mean that for large projects it’s not a problem? By large project I mean large number of intents and each intent having a good amount of data.

Yasmine · March 16, 2021, 7:02am

Hey @dakshvar22 ,

In fact, I found out 2 things when I was testing the new model_confidence value “linear_norm”.

With a large project, I got very low confidences even though the example exists already in training data ( like the 0.016… value I mentioned before)

With a small project, I got high confidences even though the example doesn’t exist in training data (like the previous example with “ab” input)

Thanks!

dakshvar22 · March 16, 2021, 7:33am

Hi Yasmine, thanks for clarifying that.

For a small project, this is slightly expected because with small amount of data, the model isn’t able to learn properly what’s legible and what’s gibberish. It needs more data to figure that out. I would park that problem for now because in production you wouldn’t have such small amount of data anyways.

For a large project, as I mentioned if an example is being classified with low confidence with linear_norm, it means that multiple intents are competing for the correct class and that could be happening very much because of wrong annotations / overlapping intent classes / similar examples across different intents.

I would like to go deeper into the latter problems with your assistant. As a first step, are you familiar with how to install Rasa from source and work with experimental branches of Rasa Open Source? This is the recommended way to install from source.

The objective is to try out small changes in the source code and see what works best in your case. I can’t guarantee that we’ll reach a solution but I’m sure we’ll learn something about your assistant and what’s really happening in the model. Let me know if you are up for some experimentation

Yasmine · March 16, 2021, 7:56am

Hey @dakshvar22 ,

Thank you for your quick reply.

I have similar examples across different intents in my training data. Maybe the problem is due to that.

I will try to clean my dataset and install rasa from source as you suggested.

And, yes if you have another suggestions, I’m willing to try them.

Thank you so much.

dakshvar22 · March 16, 2021, 8:20am

Awesome!

I have similar examples across different intents in my training data. Maybe the problem is due to that.

That definitely sounds like a valid problem. I’d suggest checking how you can reduce that overlap. It may require some restructuring of intents.

if you have another suggestions, I’m willing to try them.

I have created a new branch named investigate_low_confidence. Once you have installed rasa from source, you can pull this branch and switch to it. It would be great if you can re-train your large assistant using this branch of rasa with constrain_similarities: True and model_confidence: linear_norm . Subsequently, compare the test F1 scores when trained on this branch v/s when trained using 2.4.0 version of Rasa Open Source. Also, then compare predicted confidences for examples that are present in the training data and also gibberish examples. Let me know your observations and then we can think of some next steps.

aaronlikesrasa · June 17, 2023, 4:03pm

did you ever find a solution to this? I am facing the same problem with trying to trigger a fallback

Topic		Replies	Views
Wrong Confidence Rasa Open Source	1	315	February 26, 2020
Retrieval Intents has confidence > 1 Rasa Open Source	4	505	April 6, 2021
Random input - intent classified with high confidence Rasa Open Source	5	655	December 22, 2020
NLU Fallback is taking precedence to other intents irrespective of threshold value and their confidence score Feedback on Rasa Open Source	9	1441	August 16, 2022
Rasa NLU predicting random intent Rasa Open Source	14	1684	November 15, 2019

Fallback doesn't work with 2 retrieval intents FAQ and Chitchat

Related topics