Tests not finishing

jonathanpwheat · January 13, 2021, 2:20pm

TL;DR - rasa test stops mid test, console dump below

Details -

I have a new (not upgraded) small talk bot with 90 intents, 90 stories and my nlu.yml file is 9977 lines long. The model trains with no issues, other than it takes about 57 minutes and rasa shell works and the bot responds properly when playing around.

I’ve created 6 simple tests to get started, all with a single user phrase, intent and action like this

  - story: Greeting 1
    steps:
    - user: |
        Hi
      intent: greet
    - action: utter_greet

When I run my command, I get a few odd errors. The results directory is generated with the expected files, but the test just stops when performing Entity evaluation:

rasa test --stories tests/small_talk_tests.yml --fail-on-prediction-errors
2021-01-13 08:46:54 INFO     rasa.model  - Loading model models/20210107-172420.tar.gz...
2021-01-13 08:47:00 INFO     rasa.core.test  - Evaluating 0 stories
Progress:
0it [00:00, ?it/s]
2021-01-13 08:47:00 INFO     rasa.core.test  - Finished collecting predictions.
2021-01-13 08:47:00 WARNING  rasa.test  - No labels to evaluate. Skip evaluation.
2021-01-13 08:47:00 INFO     rasa.core.test  - Evaluation Results on END-TO-END level:
2021-01-13 08:47:00 INFO     rasa.core.test  -  Correct:          0 / 0
2021-01-13 08:47:00 INFO     rasa.core.test  -  F1-Score:         0.000
2021-01-13 08:47:00 INFO     rasa.core.test  -  Precision:        0.000
2021-01-13 08:47:00 INFO     rasa.core.test  -  Accuracy:         0.000
2021-01-13 08:47:00 INFO     rasa.core.test  -  In-data fraction: 0
2021-01-13 08:47:00 WARNING  rasa.test  - No labels to evaluate. Skip evaluation.
2021-01-13 08:47:00 INFO     rasa.core.test  - Stories report saved to results/story_report.json.
2021-01-13 08:47:01 INFO     rasa.core.test  - Evaluation Results on ACTION level:
2021-01-13 08:47:01 INFO     rasa.core.test  -  Correct:          0 / 0
2021-01-13 08:47:01 INFO     rasa.core.test  -  F1-Score:         0.000
2021-01-13 08:47:01 INFO     rasa.core.test  -  Precision:        0.000
2021-01-13 08:47:01 INFO     rasa.core.test  -  Accuracy:         0.000
2021-01-13 08:47:01 INFO     rasa.core.test  -  In-data fraction: 0
/home/jwheat/anaconda2/envs/rasa2/lib/python3.8/site-packages/rasa/utils/plotting.py:75: UserWarning: Attempting to set identical left == right == -0.5 results in singular transformations; automatically expanding.
  plt.imshow(
/home/jwheat/anaconda2/envs/rasa2/lib/python3.8/site-packages/rasa/utils/plotting.py:75: UserWarning: Attempting to set identical bottom == top == -0.5 results in singular transformations; automatically expanding.
  plt.imshow(
2021-01-13 08:47:01 INFO     rasa.utils.plotting  - Confusion matrix, without normalization: 
[]
2021-01-13 08:47:02 INFO     rasa.model  - Loading model models/20210107-172420.tar.gz...
2021-01-13 08:47:05 INFO     rasa.nlu.test  - Running model for predictions:
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9701/9701 [01:02<00:00, 154.45it/s]2021-01-13 08:48:08 INFO     rasa.nlu.test  - Intent evaluation results:
2021-01-13 08:48:08 INFO     rasa.nlu.test  - Intent Evaluation: Only considering those 9701 examples that have a defined intent out of 9701 examples.       
/home/jwheat/anaconda2/envs/rasa2/lib/python3.8/site-packages/sklearn/metrics/_classification.py:1221: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
/home/jwheat/anaconda2/envs/rasa2/lib/python3.8/site-packages/sklearn/metrics/_classification.py:1221: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, msg_start, len(result))
2021-01-13 08:48:08 INFO     rasa.nlu.test  - Classification report saved to results/intent_report.json.
2021-01-13 08:48:08 INFO     rasa.nlu.test  - Incorrect intent predictions saved to results/intent_errors.json.
2021-01-13 08:48:08 INFO     rasa.utils.plotting  - Confusion matrix, without normalization: 
[[223   1   0 ...   0   0   0]
 [  0  44   0 ...   0   0   0]
 [  0   0 116 ...   0   0   0]
 ...
 [  0   0   0 ... 150   0   0]
 [  0   0   0 ...   0  13   0]
 [  0   0   0 ...   0   0 294]]
2021-01-13 08:48:55 INFO     rasa.nlu.test  - Entity evaluation results:

It literally stops right there and gives me my cursor back. It never actually runs my story tests. I know this because I set a bogus intent that should make the story fail

  - story: Greeting 1
    steps:
    - user: |
        Hi
      intent: xxxgreet
    - action: utter_greet

I get the same output with the incorrect xxxgreet intent.

I see an couple warnings about No labels to evaluate, a python UserWarning and UndefinedMetricWarning I’ve never seen any of those errors before. This is the first time I’ve tried testing a 2.x bot outside the normal out-of-the-box mood bot, so maybe I’m missing something.

Here’s my version information

Rasa Version     : 2.1.3
Rasa SDK Version : 2.1.2
Rasa X Version   : None
Python Version   : 3.8.0
Operating System : Linux-4.4.0-19041-Microsoft-x86_64-with-glibc2.10
Python Path      : /home/jwheat/anaconda2/envs/rasa2/bin/python

Any ideas?

fkoerner · January 14, 2021, 3:17pm

That does look pretty weird. One thing that jumps out at me is this line: Evaluating 0 stories, though that wouldn’t be related to entity extraction.

Have you tried running rasa test core?

jonathanpwheat · January 14, 2021, 6:00pm

No I hadn’t tried that, Let me retrain and run that. Give me about 57 minutes and I can tell ya what happens

jonathanpwheat · January 14, 2021, 6:55pm

alright, since I’ve never run rasa test core I’m not sure if this completed or not

Here’s the output

rasa test core
2021-01-14 13:51:03.555429: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
2021-01-14 13:51:03.556933: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.2021-01-14 13:51:06.052208: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2021-01-14 13:51:06.053596: W tensorflow/stream_executor/cuda/cuda_driver.cc:312] failed call to cuInit: UNKNOWN ERROR (303)
2021-01-14 13:51:06.067316: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (LAPTOP-UMLLMSC2): /proc/driver/nvidia/version does not exist
2021-01-14 13:51:06 INFO     rasa.model  - Loading model models/20210114-134750.tar.gz...
2021-01-14 13:51:07.020210: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library 
(oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-01-14 13:51:07.044486: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2208000000 Hz
2021-01-14 13:51:07.055550: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fffeed2c090 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-01-14 13:51:07.057060: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
Processed story blocks: 100%|████████████████████████████████████████████████████████████████████████████████| 90/90 [00:00<00:00, 2555.41it/s, # trackers=1]2021-01-14 13:51:13 INFO     rasa.core.test  - Evaluating 90 stories
Progress:
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 90/90 [00:00<00:00, 106.82it/s]2021-01-14 13:51:14 INFO     rasa.core.test  - Finished collecting predictions.
2021-01-14 13:51:14 INFO     rasa.core.test  - Evaluation Results on CONVERSATION level:
2021-01-14 13:51:14 INFO     rasa.core.test  -  Correct:          90 / 90
2021-01-14 13:51:14 INFO     rasa.core.test  -  F1-Score:         1.000
2021-01-14 13:51:14 INFO     rasa.core.test  -  Precision:        1.000
2021-01-14 13:51:14 INFO     rasa.core.test  -  Accuracy:         1.000
2021-01-14 13:51:14 INFO     rasa.core.test  -  In-data fraction: 1
2021-01-14 13:51:14 INFO     rasa.core.test  - Stories report saved to results/story_report.json.
2021-01-14 13:51:15 INFO     rasa.core.test  - Evaluation Results on ACTION level:
2021-01-14 13:51:15 INFO     rasa.core.test  -  Correct:          180 / 180
2021-01-14 13:51:15 INFO     rasa.core.test  -  F1-Score:         1.000
2021-01-14 13:51:15 INFO     rasa.core.test  -  Precision:        1.000
2021-01-14 13:51:15 INFO     rasa.core.test  -  Accuracy:         1.000
2021-01-14 13:51:15 INFO     rasa.core.test  -  In-data fraction: 1
2021-01-14 13:51:15 INFO     rasa.utils.plotting  - Confusion matrix, without normalization: 
[[90  0  0 ...  0  0  0]
 [ 0  1  0 ...  0  0  0]
 [ 0  0  1 ...  0  0  0]
 ...
 [ 0  0  0 ...  1  0  0]
 [ 0  0  0 ...  0  1  0]
 [ 0  0  0 ...  0  0  1]]
Failed stories written to 'results/failed_test_stories.yml'

It stops right there and I get my cursor back.

My results/failed_test_stories.yml file is empty, is that good?

fkoerner · January 14, 2021, 8:30pm

You shouldn’t have to retrain, but I guess now you’ve got the new model

It looks like you’ve just tested on your training stories, which is why you got 90/90 and results/failed_test_stories.yml is empty. Sorry for the confusion, I should have specified that you need to run rasa test core and point to your test stories like so:

rasa test core --stories tests/small_talk_tests.yml

Do you have a GPU?

jonathanpwheat · January 14, 2021, 9:26pm

Thank you for looking over this for me, I really appreciate it.

Actually I did have to retrain. Our models directory is gitignored and I had been working in another branch and it was overwritten, no biggie.

I’ve been fighting that GPU/Cuda warning for a while. I’m unfortunately on Windows 10 in a WSL/Ubuntu terminal and can’t seem to get the drivers happy. That’s what that mess is about.

I had read rasa test would automatically pick up the test file(s) if they were named test_* so I had renamed my file test_small_talk.yml and figured the plain old rasa test core would have nabbed it but as you pointed out, it didn’t.

This time around I specified the file and it looks better, maybe? Does it still stop short? I’m not sure if it is supposed to say X number of stories passed or stop like it does.

rasa test core --stories tests/test_small_talk.yml
2021-01-14 16:13:55.950223: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
2021-01-14 16:13:55.951730: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.2021-01-14 16:14:03.793422: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2021-01-14 16:14:03.794344: W tensorflow/stream_executor/cuda/cuda_driver.cc:312] failed call to cuInit: UNKNOWN ERROR (303)
2021-01-14 16:14:03.795136: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (LAPTOP-UMLLMSC2): /proc/driver/nvidia/version does not exist
2021-01-14 16:14:03 INFO     rasa.model  - Loading model models/20210114-134750.tar.gz...
2021-01-14 16:14:05.080095: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library 
(oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-01-14 16:14:05.092811: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2208000000 Hz
2021-01-14 16:14:05.096182: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fffce836600 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-01-14 16:14:05.097404: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2021-01-14 16:14:11 INFO     rasa.shared.utils.validation  - The 'version' key is missing in the training data file /tmp/tmp2tv7qp2o/f1a58d47d7c142698ed310bba16bf3a8_test_small_talk.yml. Rasa Open Source will read the file as a version '2.0' file. See https://rasa.com/docs/rasa/training-data-format.
/home/jwheat/anaconda2/envs/rasa2/lib/python3.8/site-packages/rasa/shared/utils/io.py:93: UserWarning: Issue found in '/tmp/tmp2tv7qp2o/f1a58d47d7c142698ed310bba16bf3a8_test_small_talk.yml':
Found intent 'xxxgreet' in stories which is not part of the domain.
  More info at https://rasa.com/docs/rasa/stories
Processed story blocks: 100%|██████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 2470.63it/s, # trackers=1]2021-01-14 16:14:11 INFO     rasa.core.test  - Evaluating 6 stories
Progress:
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 90.73it/s]2021-01-14 16:14:11 INFO     rasa.core.test  - Finished collecting predictions.
2021-01-14 16:14:11 INFO     rasa.core.test  - Evaluation Results on CONVERSATION level:
2021-01-14 16:14:11 INFO     rasa.core.test  -  Correct:          5 / 6
2021-01-14 16:14:11 INFO     rasa.core.test  -  F1-Score:         0.909
2021-01-14 16:14:11 INFO     rasa.core.test  -  Precision:        1.000
2021-01-14 16:14:11 INFO     rasa.core.test  -  Accuracy:         0.833
2021-01-14 16:14:11 INFO     rasa.core.test  -  In-data fraction: 0.917
2021-01-14 16:14:11 INFO     rasa.core.test  - Stories report saved to results/story_report.json.
2021-01-14 16:14:12 INFO     rasa.core.test  - Evaluation Results on ACTION level:
2021-01-14 16:14:12 INFO     rasa.core.test  -  Correct:          11 / 12
2021-01-14 16:14:12 INFO     rasa.core.test  -  F1-Score:         0.950
2021-01-14 16:14:12 INFO     rasa.core.test  -  Precision:        1.000
2021-01-14 16:14:12 INFO     rasa.core.test  -  Accuracy:         0.917
2021-01-14 16:14:12 INFO     rasa.core.test  -  In-data fraction: 0.917
2021-01-14 16:14:12 INFO     rasa.utils.plotting  - Confusion matrix, without normalization: 
[[0 0 0 0 0]
 [0 6 0 0 0]
 [1 0 2 0 0]
 [0 0 0 2 0]
 [0 0 0 0 1]]
Failed stories written to 'results/failed_test_stories.yml'

I actually get my failed story of xxxgreet to show up now, so that’s progress

My failed_test_stories.yml

version: "2.0"
stories:
- story: Greeting 1 (/tmp/tmp2tv7qp2o/f1a58d47d7c142698ed310bba16bf3a8_test_small_talk.yml)
  steps:
  - intent: xxxgreet
  - action: utter_greet  # predicted: action_default_fallback

fkoerner · January 15, 2021, 4:32pm

You’re welcome!

I see about your setup. CUDA can be difficult to please Without belaboring what may be obvious to you, these warnings mean training is defaulting to using the CPU, which is one reason your training may be taking longer than you’d like.

I think that output looks much better. You’ve failed only 1/6 conversations, which is why you have only one story in the results/failed_test_stories.yml. This following line from your logs was to be expected, since you added a dummy intent:

Found intent 'xxxgreet' in stories which is not part of the domain.

If you haven’t checked it out already, this page of the docs might help you interpret your output.

Okay, so it looks like core is doing fine. It could be that there was some issue picking up the stories in the first place (when you ran rasa test without core). That would explain why it looks like nothing was happening for entity extraction – there was nothing to extract. I’m not sure why this would’ve happened, though, because you did point to them. The stories don’t seem to be malformed either, because rasa test core tested them once you pointed to them.

I’ll do some digging to see what’s going on. You could try running the command below again, could be something was up the first time, or something is wrong with the --fail-on-prediction-errors flag (there shouldn’t be, but it’s possible there’s a bug )

rasa test --stories tests/test_small_talk.yml

Otherwise, you can do a sanity check and test the nlu portion. The bot should ace this since it’s seen all the nlu.yml data before

rasa test --nlu nlu.yml

jonathanpwheat · January 15, 2021, 8:05pm

I received the same abrupt stop.

On a whim, I spun up a new rasa init bot and run rasa test because that bot doesn’t have any entities either and I wanted to see what happens,

I got the same thing, it stops at the Entity evaluation results:

So maybe it’s a bug in the test code if there are no entities and it stops, when it should say something like No entities found and/or Testing complete

To play some more, I added an entity and extended the generic mood bot by adding the entity to the nlu data, trained and tested it again. And it made it past the Entity Evaluation results section

[other test results snipped]
2021-01-15 14:39:44 INFO     rasa.nlu.test  - Entity evaluation results:
2021-01-15 14:39:44 INFO     rasa.nlu.test  - Evaluation for entity extractor: DIETClassifier 
2021-01-15 14:39:44 INFO     rasa.nlu.test  - Classification report saved to results/DIETClassifier_report.json.
2021-01-15 14:39:44 INFO     rasa.nlu.test  - Incorrect entity predictions saved to results/DIETClassifier_errors.json.
2021-01-15 14:39:44 INFO     rasa.utils.plotting  - Confusion matrix, without normalization: 
[[  5   1]
 [  1 157]]

So I guess there’s no real problem with my bot other than I was expecting something after the Entity evaluation results:

Sorry to waste your time

fkoerner · January 20, 2021, 1:14pm

So maybe it’s a bug in the test code if there are no entities and it stops, when it should say something like No entities found and/or Testing complete

This is very true! I raised a similar issue thanks to your question which states that testing should stop if no stories are found, rather than produce confusing logs and empty files. I’ll update it to include this feedback as well.

You didn’t waste my time! I think this will result in an improvement of our output during testing. It’s always good for us to learn from users where things are confusing or misleading, that way we can try and make them better

Topic		Replies	Views
Invalid Rasa End-To-End Test Evaluation Rasa Open Source	7	855	December 26, 2022
Error when testing Rasa NLU model Rasa Open Source	6	1926	June 21, 2021
Rasa test Error Rasa Open Source	1	377	July 24, 2021
Rasa test vs. rasa test core Rasa Open Source testing	3	1243	November 7, 2021
Forcing intent selection and entities recognition during e2e evaluation Rasa Open Source	5	924	March 3, 2020

Tests not finishing

Related topics