Should one use all stories during training data?

NOlbert · August 13, 2020, 11:26am

I know it is not a good practice to use all available data for training an NLU-model because this might cause overfitting.

However, what about stories? The documentation says nothing about a test/train split for stories. Is this a case where all available data should be used because an exact fit to the training data is considered beneficial?

The story tests seem to be only for validation, not for testing generalization.

b-quachtran · August 14, 2020, 8:48pm

Hi @NOlbert. In most cases, including more training stories will improve model fit, and test stories should be sourced from real conversation data when possible. You can refer to this blog post for some guidelines on building out your story training data.

Topic		Replies	Views
Is it better to have lots of stories? - general discussion Rasa Open Source	2	500	March 12, 2019
Best Practices: End-to-End Test Cases vs Training Data for CDD Rasa Open Source	1	344	February 25, 2022
Comparing Policies - guide not clear Rasa Open Source	5	462	March 2, 2020
Using NLP for Training data selection Rasa Open Source	0	405	September 18, 2018
80% train 20% test for stories? Rasa Open Source	5	514	June 26, 2020

Should one use all stories during training data?

Related topics