Should one use all stories during training data?

I know it is not a good practice to use all available data for training an NLU-model because this might cause overfitting.

However, what about stories? The documentation says nothing about a test/train split for stories. Is this a case where all available data should be used because an exact fit to the training data is considered beneficial?

The story tests seem to be only for validation, not for testing generalization.

Hi @NOlbert. In most cases, including more training stories will improve model fit, and test stories should be sourced from real conversation data when possible. You can refer to this blog post for some guidelines on building out your story training data.

1 Like