80% train 20% test for stories?

Domina · June 12, 2020, 6:55pm

Hello everyone,

I wanted to see how everyone is handling their train/test stories. I am currently working with an e2e test set of 500 stories, and I have 207 stories in my train set. I wanted to see if the train stories and e2e stories should follow the 80% / 20% rule? Is it best practice to follow this for all test/train sets in RASA?

koaning · June 15, 2020, 10:37am

Hi Chris.

While the rasa test nlu command allows you to properly run a gridsearch the rasa test core does not. In that sense I’d say it makes sense to make a seperate set of stories to test on but I’d make sure that both the test as well as train set cover the same ground. If there’s an imbalance between the two (say the test set contains all the easy stories and the train set contains all the hard ones) then it will be hard to assign a lot of value to a summary statistic.

The 80/20 rule can be fine assuming both sets are large enough to be representative of the use-case you’re trying to solve and that both sets are balanced. Do you have specific concerns in your usecase?

In general when it comes to judging models this venn-diagram is the best advice that I can give on the topic;

The stories you optimise towards may be different than the stories that your users generate. Your main concern therefore is to make sure that the stories that occur in real life are also the stories that you optimise towards. Your chatbot may be really good at FAQ because these are properly represented in your stories but if your users ask for chitchat instead then you’re at risk of overfitting.

Domina · June 23, 2020, 6:19pm

Koaning, thank you for your response. My concern is, I am using the 80/20 rule for the NLU but I was not doing the same for the core.

Pardon my ignorance, I am fairly new to using RASA but what it sounds like you are saying is that we should create stories for any and all interactions our bot may come across? Without trained stories the bot may not function properly?

koaning · June 24, 2020, 9:58am

Pardon my ignorance

Absolutely no worries. I’m here to understand what our users find confusing and you’re asking important questions.

it sounds like you are saying is that we should create stories for any and all interactions our bot may come across

Yes. The stories that you create represent flows of dialogue that the digital assistant needs to be able to handle. Also note that the intents and entities that are predicted by the NLU part of the library are used as input for the “dialogue policy” models that predict the next best action to take. If you’re intersted in more details about this you may appreciate this video on TED (it’s but one policy model, but it might help you understand some underlying details).

Without trained stories the bot may not function properly?

This is also true.

Domina · June 25, 2020, 12:55pm

Yes. The stories that you create represent flows of dialogue that the digital assistant needs to be able to handle

Normally I don’t mix train data and test data but in this case of story paths, is it wise to mix?

koaning · June 26, 2020, 9:30am

I think it is impossible to have stories in train/test that are 100% different. There’s bound to be overlap. Most chatbots need to be able to greet and say goodbye. It’d be weird to have all no stories with goodbye in train set. It’d also be weird to have no stories with hello in the test set.

That said, I’d keep the test set as a unit test of sorts. These are examples that should be representative of your use-case and you’ll use this set as a proxy to determine if you like the results that you see.

Topic		Replies	Views
What is the standard practice for Rasa Testing? Rasa Open Source	4	469	September 24, 2020
Comparing Policies - guide not clear Rasa Open Source	5	462	March 2, 2020
The 'testing stories' output Rasa Open Source	2	958	September 17, 2019
Indeterministic behaviour during "rasa test core" Rasa Open Source	0	284	September 1, 2020
Anyone written code to generate end-to-end stories from "regular" stories? Rasa Open Source	4	345	July 15, 2020

80% train 20% test for stories?

Related topics