I upgraded from the rasa 1.10.14 NLU library to 2.6.3 and ran model training on the same training data as well as using the same configuration, however the predictions I get are different (along with confidence levels).
Since we have a lot of downstream applications built on these intent predictions, we need the performance of the intent models to be similar to the rasa 1.x version.
I looked around for any documentation but could not find one.
So the following questions arise:
How has the intent model been changed across rasa 1 and rasa 2?
How can we get the similar confidence levels from both rasa 1 and rasa 2 intent models?
Hi Shivam! Confidence isn’t really a measure of model performance; a model can have higher confidence but low accuracy (especially if there’s any overfitting). I would recommend tracking other performance metrics instead. We offer a CLI function that automates a lot of the comparisons:
rasa test nlu --cross-validation
Without more information it’s hard to say exactly why there are differences: it could be due to adding rules or additional/different training data or changes in your pipeline. Are there any warnings about deprications showing up?
Also, in 2.0 we introduced a suggested pipeline: if your config.yml is all commented out then the default is being used.
The config has not been changes across the runs, I use the ConvertFeaturizer and Convert Tokenizer along with some other featurizers, but config is same across rasa1.x run and rasa2.x run.
I am using no policy, just pipeline, and running only intent classification. (Not even training on entity classification examples). And using the confidence as a threshold for assigning intent, if it’s lower than a certain threshold, the intent is not assigned to the particular example. (which is why the confidence level difference is critical here).
Training data is same across both the runs, just converted the nlu.md file to the format required by rasa2.x using the rasa data convert nlu -f yaml ... command.
Interestingly, we have used both ConvertTokenizer and ConvertFeaturizer one after the other (because of legacy reasons, but ConvertTokenizer seems to now be deprecated).
Do you think this might have an effect?