I need to evaluate and compare the performances of my policies; I’m trying to follow this guide, but I find it quite confusing.
Especially in the beginning of a project, you do not have a lot of real conversations to use to train your bot, so you don’t just want to throw some away to use as a test set.
Here it says that I shouldn’t “throw away” my data for using it as test set
Once you are happy with it, you can then train your final configuration on your full data set.
But one line below it says that I can then train the model on the “full data set”. So I did have to split my data? And how can I split it?
For each policy configuration provided, Rasa Core will be trained multiple times with 0, 5, 25, 50, 70 and 95% of your training stories excluded from the training data.
What’s the point of doing that? Why would I want to train my model on just a portion of the training set?
Also, I didn’t really get the way the stories are evaluated. If I have one story 30 utterances long and another one 2 utterances long, and my model predicts incorrectly one time each, are the 2 stories evaluated differently?
Thank you, Tiziano