As number of intents increases, confidence level decreases

harshitazilen · August 23, 2018, 7:16am

I am working with a chatbot project which involves Rasa NLU mainly.

It involves chatbot for FAQ support. As some of the FAQs are quite similar in terms of language, once number of intents (type of questions) are more, NLU gives lower confidence level.

e.g. Initially I tested with 2 intents and the confidence level was near 70 to 90%. Now the project has 80 intents, the confidence level is near to 8 to 10%.

I want to know in this kind of scenario, where around 80 intents are there, some of the intents are quite similar in nature, what should be the threshold for the NLU?

NikhilBansal21 · August 23, 2018, 8:34am

what are you using in pipeline spacy or tensorflow?

harshitazilen · August 23, 2018, 8:37am

I am using spacy pipeline.

NikhilBansal21 · August 23, 2018, 8:42am

use tensorflow for high intent confidence spacy’s confidence varies

harshitazilen · August 23, 2018, 8:42am

Ok. I recently trained using Tensorflow, Will let you know the results.

amn41 · August 23, 2018, 7:25pm

@NikhilBansal21 @harshitazilen do please read the ‘note about confidence scores’ here http://rasa.com/docs/nlu/fallback/

harshitazilen · August 24, 2018, 5:39am

Yes I read it. I used same training data with spacy and tensorflow_embeding pipelines. Below is the report

Spacy Pipeline:

Total Intents: 80
Average confidence while detecting a training statement : 0.08 - 0.1

With spacy I am struggling to decide the threshold level as confidence is super low.

Tensorflow_embeding Pipeline:

Total Intents: 80
Average confidence while detecting a training statement : 0.8 - 0.9

With tensorflow_embeding, if this is overfitting, I will need to remove training data and epochs.

Kindly guide me.

souvikg10 · August 24, 2018, 8:56am

Do you have a test set? if not then use -3-fold cross-validation

For overfitting and underfitting, you should look at the F1 Score of test vs train

if test is low and train is high you have overfitting or underfitting the other way around.

I think F1 score is a good baseline that can indicate how good is your model.

Also is your dataset balanced? we have over 140 intents and most of them are not well balanced and ideally now we are trying to merge some classes and figure out programmatically how to deal with the difference. Since you are doing FAQ, one good reminder as I have seen from tests with real users is people don’t interact with FAQ pages the same way they do with a chatbot. So If you have reused the same classes for your chatbot, It won’t really make sense in the end.

Topic		Replies	Views
Rasa NLU without Rasa Core Getting Started with Rasa confidence	4	194	August 23, 2019
How can we improve confidence score of intents Rasa Open Source	7	4676	October 15, 2018
NLU detects random input with wrong intent and high confidence Rasa Open Source	39	5258	July 27, 2022
Improve Rasa NLU model Rasa Open Source	5	2168	October 15, 2019
Rasa with spaCy Rasa Open Source	3	526	March 3, 2022

As number of intents increases, confidence level decreases

Related topics