Difference in intent prediction confidence values across rasa1.x and rasa2.x

shivam.17 · June 7, 2021, 10:45am

I upgraded from the rasa 1.10.14 NLU library to 2.6.3 and ran model training on the same training data as well as using the same configuration, however the predictions I get are different (along with confidence levels).

Since we have a lot of downstream applications built on these intent predictions, we need the performance of the intent models to be similar to the rasa 1.x version.

I looked around for any documentation but could not find one. So the following questions arise:

How has the intent model been changed across rasa 1 and rasa 2?
How can we get the similar confidence levels from both rasa 1 and rasa 2 intent models?
Why are these difference in confidence occurring?

rctatman · June 7, 2021, 8:58pm

Hi Shivam! Confidence isn’t really a measure of model performance; a model can have higher confidence but low accuracy (especially if there’s any overfitting). I would recommend tracking other performance metrics instead. We offer a CLI function that automates a lot of the comparisons:

rasa test nlu --cross-validation

Without more information it’s hard to say exactly why there are differences: it could be due to adding rules or additional/different training data or changes in your pipeline. Are there any warnings about deprications showing up?

Also, in 2.0 we introduced a suggested pipeline: if your config.yml is all commented out then the default is being used.

shivam.17 · June 8, 2021, 6:08am

The config has not been changes across the runs, I use the ConvertFeaturizer and Convert Tokenizer along with some other featurizers, but config is same across rasa1.x run and rasa2.x run.

I am using no policy, just pipeline, and running only intent classification. (Not even training on entity classification examples). And using the confidence as a threshold for assigning intent, if it’s lower than a certain threshold, the intent is not assigned to the particular example. (which is why the confidence level difference is critical here).

Training data is same across both the runs, just converted the nlu.md file to the format required by rasa2.x using the rasa data convert nlu -f yaml ... command.

Interestingly, we have used both ConvertTokenizer and ConvertFeaturizer one after the other (because of legacy reasons, but ConvertTokenizer seems to now be deprecated). Do you think this might have an effect?

shivam.17 · June 9, 2021, 6:33am

@rctatman any comments?

Topic		Replies	Views
Extracting All Confidence Levels for Predictions [Deprecated] Rasa X Community Edition	2	725	June 25, 2020
Same input get different intent Rasa Open Source	1	592	August 9, 2019
Same training data in different projects give different confidence scores Rasa Open Source	3	558	February 26, 2019
Inconsistency between results/intent_errors.json and rasa shell nlu Rasa Open Source	7	551	July 15, 2021
Issue in training intents Rasa Open Source	3	303	December 7, 2022

Difference in intent prediction confidence values across rasa1.x and rasa2.x

Related topics