I have two retrieval intents (faq and chitchat), when I provide a random input, Rasa NLU classifies the input using those retrieval intents when logically it’s a nlu_fallback intent.
This problem occurs even in the previous versions.
I found that with the new version 2.3.4 model_confidence " This should ease up tuning fallback thresholds as confidences for wrong predictions are better distributed across the range [0, 1]"
It seems weird to me that you have confidence1.0. That shouldn’t be possible to happen with ML components. Did you use rasa shell or rasa interactive for this?
Yes, Some Q/A assistants need only 2 retrieval intents to work properly, like chitchat and faq in this case.
Otherwise, I tried some assistants with classical intents instead of two retrieval intents. But, I noticed that the linear_norm option gives a very low confidence (the opposite of softmax). I mean by low, a confidence = 0.016… for an input that exists already in training data.
Could you give me an example in which model_confidence=linear_norm is helpful ?
That is actually better because with softmax the model is overly confident about almost everything(right or wrong). If the input that exists already in the training data is being classified with such low confidence it means that it is being highly confused with some other intent and the training data should be investigated for such clash. model_confidence=softmax masks off such problems in many cases.
In fact, I tried that with many examples from different intents and the confidence is still low. The problem here, is that I couldn’t fix a good fallback threshold to avoid wrong inputs.
Also, I’ve noticed that this behavior happens when the project is large, because with small projects I’m getting the same problem as softmax (high confidence for random inputs)
I converted the previous files using normal intents :
For a small project, this is slightly expected because with small amount of data, the model isn’t able to learn properly what’s legible and what’s gibberish. It needs more data to figure that out. I would park that problem for now because in production you wouldn’t have such small amount of data anyways.
For a large project, as I mentioned if an example is being classified with low confidence with linear_norm, it means that multiple intents are competing for the correct class and that could be happening very much because of wrong annotations / overlapping intent classes / similar examples across different intents.
I would like to go deeper into the latter problems with your assistant. As a first step, are you familiar with how to install Rasa from source and work with experimental branches of Rasa Open Source? This is the recommended way to install from source.
The objective is to try out small changes in the source code and see what works best in your case. I can’t guarantee that we’ll reach a solution but I’m sure we’ll learn something about your assistant and what’s really happening in the model. Let me know if you are up for some experimentation
I have similar examples across different intents in my training data. Maybe the problem is due to that.
That definitely sounds like a valid problem. I’d suggest checking how you can reduce that overlap. It may require some restructuring of intents.
if you have another suggestions, I’m willing to try them.
I have created a new branch named investigate_low_confidence. Once you have installed rasa from source, you can pull this branch and switch to it. It would be great if you can re-train your large assistant using this branch of rasa with constrain_similarities: True and model_confidence: linear_norm . Subsequently, compare the test F1 scores when trained on this branch v/s when trained using 2.4.0 version of Rasa Open Source. Also, then compare predicted confidences for examples that are present in the training data and also gibberish examples.
Let me know your observations and then we can think of some next steps.