What is the standard practice for Rasa Testing?

Hi All,

By convention how much stories should be tested using ‘Rasa Test’ command, all the data or just a small set. I have about 40 intents. Should I create stories that should fail in the end to end format? can I test fallback policy?

I can make all my stories test pass without changing nlu/core, with careful wording in my text in the end to end format is enough to pass, but surely there are other ways to give a realistic test? My chatbot is far from perfect but passes test with high accuracy, so perhaps i am not using good testing practice?

edit: I have also noticed if we are using a end2end format similar to this: ‘show me ‘[‘chinese’]’(cuisine)’ restaurants’ we already helping the chatbot extract the correct entity, sometime giving the correct entity is enough to predict a good NLU confidence, since most my entities are not reused for diff stories.

You can also run rasa test nlu with a cross validation command. That way you can have each data point be represented as a test case once. Have you ever done this?

You might appreciate this benchmarking guide that I wrote for rasa nlu examples.

Thanks I have ran the benchmark tests for my NLU. the latter test (cross validation) I get less accuracy compared to Rasa test nlu where it outputs 100%. What does the output result of cross validation suggest.

Could you share the commands that you ran? There’s settings that might cause such changes.

I ran this following command: rasa test nlu --config basic-config.yml –cross-validation --runs 1 --folds 2 –out gridresults/basic-config