Evaluate based on Test/Train Split

Has anyone considered an NLU evaluation option that splits out a fraction of the training data to create a test data set so you don’t have to create a separate test dataset? You could then create a confusion matrix and errors.json like the normal evaluate.


if you use the evaluation using cross-validation there is a stratified train-test split and it can run the training n-times to give you an average of all runs.

yes, but it doesn’t provide the errors.json or confusion matrix and both of those seem very useful in producing a better chatbot.

you could use the same code to generate your stratified dataset from your training data. sklearn train_test_split() might help https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

Yes, I saw that code and called it in the PR I submitted.