As number of intents increases, confidence level decreases

I am working with a chatbot project which involves Rasa NLU mainly.

It involves chatbot for FAQ support. As some of the FAQs are quite similar in terms of language, once number of intents (type of questions) are more, NLU gives lower confidence level.

e.g. Initially I tested with 2 intents and the confidence level was near 70 to 90%. Now the project has 80 intents, the confidence level is near to 8 to 10%.

I want to know in this kind of scenario, where around 80 intents are there, some of the intents are quite similar in nature, what should be the threshold for the NLU?

what are you using in pipeline spacy or tensorflow?

I am using spacy pipeline.

use tensorflow for high intent confidence spacy’s confidence varies

Ok. I recently trained using Tensorflow, Will let you know the results.

@NikhilBansal21 @harshitazilen do please read the ‘note about confidence scores’ here http://rasa.com/docs/nlu/fallback/

1 Like

Yes I read it. I used same training data with spacy and tensorflow_embeding pipelines. Below is the report

Spacy Pipeline:

  • Total Intents: 80
  • Average confidence while detecting a training statement : 0.08 - 0.1

With spacy I am struggling to decide the threshold level as confidence is super low.

Tensorflow_embeding Pipeline:

  • Total Intents: 80
  • Average confidence while detecting a training statement : 0.8 - 0.9

With tensorflow_embeding, if this is overfitting, I will need to remove training data and epochs.

Kindly guide me.

Do you have a test set? if not then use -3-fold cross-validation

For overfitting and underfitting, you should look at the F1 Score of test vs train

if test is low and train is high you have overfitting or underfitting the other way around.

I think F1 score is a good baseline that can indicate how good is your model.

Also is your dataset balanced? we have over 140 intents and most of them are not well balanced and ideally now we are trying to merge some classes and figure out programmatically how to deal with the difference. Since you are doing FAQ, one good reminder as I have seen from tests with real users is people don’t interact with FAQ pages the same way they do with a chatbot. So If you have reused the same classes for your chatbot, It won’t really make sense in the end.

1 Like