I run these two commands:
rasa test --stories tests/
rasa test core --stories tests/
The yml files have 16 test stories. If I run rasa test it fails 8. If I run rasa test core none of the stories fail.
I know that rasa test will do nlu tests as well. But I would assume testing the stories that I have specified in the test_*.yml files would be identical for both commands.
Any idea why I got different results?
This is what I see when I run rasa test:
Processed story blocks: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 467.92it/s, # trackers=1]
2021-11-07 14:58:46 INFO rasa.core.test - Evaluating 16 stories
Progress:
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:21<00:00, 1.35s/it]
2021-11-07 14:59:08 INFO rasa.core.test - Finished collecting predictions.
2021-11-07 14:59:08 INFO rasa.core.test - Evaluation Results on END-TO-END level:
2021-11-07 14:59:08 INFO rasa.core.test - Correct: 8 / 16
2021-11-07 14:59:08 INFO rasa.core.test - Accuracy: 0.500
2021-11-07 14:59:08 INFO rasa.core.test - Stories report saved to results/story_report.json.
2021-11-07 14:59:08 INFO rasa.nlu.test - Evaluation for entity extractor: TEDPolicy
2021-11-07 14:59:08 INFO rasa.nlu.test - Classification report saved to results/TEDPolicy_report.json.
2021-11-07 14:59:08 INFO rasa.nlu.test - Incorrect entity predictions saved to results/TEDPolicy_errors.json.
2021-11-07 14:59:08 INFO rasa.utils.plotting - Confusion matrix, without normalization:
[[ 0 0 0 0 26 0 0]
[ 0 0 0 0 27 0 0]
[ 0 0 0 0 8 0 0]
[ 0 0 0 0 10 0 0]
[ 0 0 0 0 70 0 0]
[ 0 0 0 0 6 0 0]
[ 0 0 0 0 14 0 0]]
And this is what I see when I run rasa test core:
Processed story blocks: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 303.18it/s, # trackers=1]
2021-11-07 15:02:39 INFO rasa.core.test - Evaluating 16 stories
Progress:
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:21<00:00, 1.35s/it]
2021-11-07 15:03:01 INFO rasa.core.test - Finished collecting predictions.
2021-11-07 15:03:01 INFO rasa.core.test - Evaluation Results on CONVERSATION level:
2021-11-07 15:03:01 INFO rasa.core.test - Correct: 16 / 16
2021-11-07 15:03:01 INFO rasa.core.test - Accuracy: 1.000
2021-11-07 15:03:01 INFO rasa.core.test - Stories report saved to results/story_report.json.
2021-11-07 15:03:01 INFO rasa.nlu.test - Evaluation for entity extractor: TEDPolicy
2021-11-07 15:03:01 INFO rasa.nlu.test - Classification report saved to results/TEDPolicy_report.json.
2021-11-07 15:03:01 INFO rasa.nlu.test - Incorrect entity predictions saved to results/TEDPolicy_errors.json.
2021-11-07 15:03:01 INFO rasa.utils.plotting - Confusion matrix, without normalization:
[[ 0 0 0 0 26 0 0]
[ 0 0 0 0 27 0 0]
[ 0 0 0 0 8 0 0]
[ 0 0 0 0 10 0 0]
[ 0 0 0 0 70 0 0]
[ 0 0 0 0 6 0 0]
[ 0 0 0 0 14 0 0]]