Comparing Policies - guide not clear

tiziano · February 17, 2020, 1:05pm

Hi,

I need to evaluate and compare the performances of my policies; I’m trying to follow this guide, but I find it quite confusing.

Especially in the beginning of a project, you do not have a lot of real conversations to use to train your bot, so you don’t just want to throw some away to use as a test set.

Here it says that I shouldn’t “throw away” my data for using it as test set

Once you are happy with it, you can then train your final configuration on your full data set.

But one line below it says that I can then train the model on the “full data set”. So I did have to split my data? And how can I split it?

For each policy configuration provided, Rasa Core will be trained multiple times with 0, 5, 25, 50, 70 and 95% of your training stories excluded from the training data.

What’s the point of doing that? Why would I want to train my model on just a portion of the training set?

Also, I didn’t really get the way the stories are evaluated. If I have one story 30 utterances long and another one 2 utterances long, and my model predicts incorrectly one time each, are the 2 stories evaluated differently?

Thank you, Tiziano

dakshvar22 · March 2, 2020, 3:47pm

@tiziano The concept of splitting your complete data into training and testing set is a well known practice in machine learning. If you train a machine learning model on all of your dataset, it may overfit or memorize data points as it is and may not be able to handle new novel data points that it encounters in real world. For rasa core, these new data points would be new real world conversations it encounters once deployed.

tiziano · March 2, 2020, 3:49pm

I’m well aware of that. I was just pointing out the lack of consistency of the guide

dakshvar22 · March 2, 2020, 3:51pm

Thanks for bringing it up. Would you be up for contributing to the documentation to make it more consistent? Please feel free to open a PR with your proposed changes. Thanks

tiziano · March 2, 2020, 3:52pm

If you are referring to my last question, the thing is different. I wasn’t talking about splitting the data into training set and test set, but about taking the training set (already split) and using just a portion of that for training the model (with exclusion percentages of 0,5,25…). Why would you want to do that?

tiziano · March 2, 2020, 3:56pm

I’m not aware of the exact functioning of Rasa, I don’t think I have enough knowledge for proposing changes to the docs Also, I’m not sure about how to open a PR

Topic		Replies	Views
Testing a Chatbot? [Deprecated] Rasa X Community Edition	1	274	June 3, 2020
Best Practices: End-to-End Test Cases vs Training Data for CDD Rasa Open Source	1	345	February 25, 2022
Evaluation of Dialog Policy Rasa Open Source	5	288	May 17, 2021
80% train 20% test for stories? Rasa Open Source	5	522	June 26, 2020
Rasa core evaluation metrics Rasa Open Source	16	2061	July 24, 2019

Comparing Policies - guide not clear

Related topics