Rasa test vs. rasa test core

endreb · November 7, 2021, 3:01pm

I run these two commands:

rasa test --stories tests/

rasa test core --stories tests/

The yml files have 16 test stories. If I run rasa test it fails 8. If I run rasa test core none of the stories fail.

I know that rasa test will do nlu tests as well. But I would assume testing the stories that I have specified in the test_*.yml files would be identical for both commands.

Any idea why I got different results?

This is what I see when I run rasa test:

Processed story blocks: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 467.92it/s, # trackers=1]
2021-11-07 14:58:46 INFO     rasa.core.test  - Evaluating 16 stories
Progress:
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:21<00:00,  1.35s/it]
2021-11-07 14:59:08 INFO     rasa.core.test  - Finished collecting predictions.
2021-11-07 14:59:08 INFO     rasa.core.test  - Evaluation Results on END-TO-END level:
2021-11-07 14:59:08 INFO     rasa.core.test  -  Correct:          8 / 16
2021-11-07 14:59:08 INFO     rasa.core.test  -  Accuracy:         0.500
2021-11-07 14:59:08 INFO     rasa.core.test  - Stories report saved to results/story_report.json.
2021-11-07 14:59:08 INFO     rasa.nlu.test  - Evaluation for entity extractor: TEDPolicy 
2021-11-07 14:59:08 INFO     rasa.nlu.test  - Classification report saved to results/TEDPolicy_report.json.
2021-11-07 14:59:08 INFO     rasa.nlu.test  - Incorrect entity predictions saved to results/TEDPolicy_errors.json.
2021-11-07 14:59:08 INFO     rasa.utils.plotting  - Confusion matrix, without normalization: 
[[ 0  0  0  0 26  0  0]
 [ 0  0  0  0 27  0  0]
 [ 0  0  0  0  8  0  0]
 [ 0  0  0  0 10  0  0]
 [ 0  0  0  0 70  0  0]
 [ 0  0  0  0  6  0  0]
 [ 0  0  0  0 14  0  0]]

And this is what I see when I run rasa test core:

Processed story blocks: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 303.18it/s, # trackers=1]
2021-11-07 15:02:39 INFO     rasa.core.test  - Evaluating 16 stories
Progress:
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:21<00:00,  1.35s/it]
2021-11-07 15:03:01 INFO     rasa.core.test  - Finished collecting predictions.
2021-11-07 15:03:01 INFO     rasa.core.test  - Evaluation Results on CONVERSATION level:
2021-11-07 15:03:01 INFO     rasa.core.test  -  Correct:          16 / 16
2021-11-07 15:03:01 INFO     rasa.core.test  -  Accuracy:         1.000
2021-11-07 15:03:01 INFO     rasa.core.test  - Stories report saved to results/story_report.json.
2021-11-07 15:03:01 INFO     rasa.nlu.test  - Evaluation for entity extractor: TEDPolicy 
2021-11-07 15:03:01 INFO     rasa.nlu.test  - Classification report saved to results/TEDPolicy_report.json.
2021-11-07 15:03:01 INFO     rasa.nlu.test  - Incorrect entity predictions saved to results/TEDPolicy_errors.json.
2021-11-07 15:03:01 INFO     rasa.utils.plotting  - Confusion matrix, without normalization: 
[[ 0  0  0  0 26  0  0]
 [ 0  0  0  0 27  0  0]
 [ 0  0  0  0  8  0  0]
 [ 0  0  0  0 10  0  0]
 [ 0  0  0  0 70  0  0]
 [ 0  0  0  0  6  0  0]
 [ 0  0  0  0 14  0  0]]

nik202 · November 7, 2021, 7:48pm

@endreb

I know you know these points

To evaluate a model on your test data, run:

rasa test

This will test your latest trained model on any end-to-end stories you have defined in files with the test_ prefix.

rasa test

rasa test core

If you want to evaluate the dialogue and NLU models separately, you can use the commands below:

rasa test core If you want to evaluate the dialogue and NLU models separately, you can use the commands below:

rasa test core

Note:

rasa test core | Tests Rasa Core models using your test stories.

You confusion matrix is identical with both commands.So, I guess nothing to worry, this is just the test stories you have mentioned.

Summary Points:

Test stories are written in the form of exemplary conversations to check whether the bot will behave as expected. You can write them in test stories file in the project folder. Once you have a good set of test cases, you can run rasa test
Once you have a good set of test cases, you can run. rasa test core --stories test_stories.yml --out results This command will generate a report about failed stories and confusion matrix for each story regardless of whether they failed or not.

endreb · November 7, 2021, 9:15pm

Hi @nik202 ,

thanks for the reply. The problem I have is 16/16 stories are correct when I run :rasa test core and only 8/16 stories are correct if I run rasa test. I have a folder tests and inside that folder I have 5 yml files containing 16 stories. What I don’t get is confusion matrix is the same, but how come the success of the stories are not?

Do you know what does: “Evaluation Results on CONVERSATION level” mean vs. “Evaluation Results on END-TO-END level”. I would guess when I run rasa test, somehow the stories evaluated differently than when I run rasa test core? But I don’t know what is the problem.

nik202 · November 7, 2021, 10:15pm

@endreb I tried to explain everything in the above post in more details and brief ways, my friend.if you need more details please see this doc Testing Your Assistant or see the result files which you generated while running the above commands. Good Luck!

Topic		Replies	Views
What is the standard practice for Rasa Testing? Rasa Open Source	4	469	September 24, 2020
Why entity extraction results in "rasa test nlu" are different from "rasa shell nlu" Rasa Open Source	0	545	September 24, 2021
How to only test stories? Rasa Open Source	2	375	May 20, 2020
How to define which test set to use in Rasa? Feedback on Rasa Open Source test	6	577	October 13, 2022
End to end testing runs nlu test on training data Rasa Open Source	4	512	August 18, 2020

Rasa test vs. rasa test core

Related topics