Hi! I’m facing some intent confusion between this 2 intents: deny and dont_understand.
I’m building the chatbot in spanish so maybe it sounds you a bit weird, but I’ve added some translations. The NLU training data looks like this:
- intent: dont_understand
- no lo he entendido (i don't uderstand it)
- no lo entiendo
- no me aclaro
- no entiendo nada (i don't understand anything)
- no entiendo que has dicho
- no, no lo entiendo
- no entiendo muchas cosas
- no he entendido nada de lo que has dicho
- nada de lo que has dicho tiene sentido
- no tiene sentido (it doesn't make sense)
- ... +20
- intent: deny
- no lo creo (i don't think so)
- lo dudo mucho
- ni pensarlo
- no por favor (no please)
- no para nada
- no gracias (no thanks)
- No, gracias
- ... +20
As you can see, both intents are very similiar in their grammar, mainly due the use of the “no” word to deny in spanish. So given the user input “no” I’m getting a 0.4 confidence for deny and 0.2 for dont_understand.
It correctly identifies the correct intent, but with a very low confidence (0.4) so it doesn’t pass my FallbackClassifier of 0.7.
What strategy should I follow to improve confidence? Because I don’t want to lower the FallbackClassifier. Maybe there is something in my current config.yml (1.2 KB)
Just to add some context: I’m working on a FAQ chatbot but I want to add some basic contextual stories to get when the user wants to learn more, don’t uderstand something etc
I see you are using the linear_norm method to compute model confidence. I know this is the recommended approach, but I would suggest you also experiment with softmax.
From my experience, the confidence values generated by linear_norm are very low, so if you use it, you probably need to lower the 0.7 confidence threshold. Note that 0.7 was chosen when ‘softmax’ was the only available option and it has not been modified after introducing linear_norm. It should still be a good confidence threshold if the generated confidence values are well calibrated and use the whole [0, 1] range, but, as I said, from my experience this is not the case.
You should check the confidence values generated for all your intents by either linear_norm or softmax and set the threshold based on them.
Thank you so much @humcasma! I’ve just tried softmax and now confidence level is around 0.98. Initially I was using the softmax method but I changed it to linear_norm following the Rasa cli recommendation. Any idea why softmax produces this behaviour?
BTW, is my approach correct? I have some other FAQ retrieval intents which are also quite similar, should I move this “potential problematic” retrieval intents to normal intents and make use of entities? This is my first time building a chatbot so I’m not quite sure how well the model is going to perform once I have a big training dataset with tons of examples which can sometimes be very similar (e.g: what’s the meaning of NLU, what’s the meaning of pipeline and what’s the meaning of utterance)
I would say that sentences like what’s the meaning of NLU?, what’s the meaning of pipeline? and what’s the meaning of utterance? do not express different intents, but the same what_is_the_meaning_of intent. Thus, I would group all of them in the same intent and use entities.
In other cases where you really have different intents, I would initially try to define different intents and eventually go for one intent and entities in case you get many misclassifications.