Rasa evalutation does not produce output files

Rasa Version: rasa==1.1.5 rasa-sdk==1.1.0

Hi, im trying to evaluate my model’s NLU components and found this guide

Unfortunately, almost all additional input flags are ignored.

If I run

rasa test nlu -u evaluate/examples.md -m models/20190805-094203.tar.gz --report evaluate/ --errors ./evaluate/ --histogram ./evaluate/ --confmat ./evaluate/

It produces the following command line output

`2019-08-05 13:19:14 INFO rasa.nlu.components - Added ‘SpacyNLP’ to component cache. Key ‘SpacyNLP-de_core_news_sm’. 2019-08-05 13:19:14 INFO rasa.nlu.training_data.loading - Training data format of ‘/tmp/tmpvz1gfilf/852bb4994431473bbbca3355c4ddd5ad_examples.md’ is ‘md’. 2019-08-05 13:19:14 INFO rasa.nlu.training_data.training_data - Training data stats: - intent examples: 100 (1 distinct intents) - Found intents: ‘answer’ - entity examples: 100 (4 distinct entities) - found entities: ‘house_number’, ‘street’, ‘residence’, ‘zipcode’

2019-08-05 13:19:14 INFO rasa.nlu.test - Running model for predictions: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:01<00:00, 82.34it/s] 2019-08-05 13:19:15 INFO rasa.nlu.test - Entity evaluation results: 2019-08-05 13:19:15 INFO rasa.nlu.test - Evaluation for entity extractor: CRFEntityExtractor /home/local/MGM/hschroeder/.virtualenvs/A12Bot/lib/python3.6/site-packages/sklearn/metrics/classification.py:1145: UndefinedMetricWarning: Recall and F-score are ill-defined and being set to 0.0 in labels with no true samples. ‘recall’, ‘true’, average, warn_for) /home/local/MGM/hschroeder/.virtualenvs/A12Bot/lib/python3.6/site-packages/sklearn/metrics/classification.py:1145: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no true samples. ‘recall’, ‘true’, average, warn_for) 2019-08-05 13:19:15 INFO rasa.nlu.test - Classification report for ‘CRFEntityExtractor’ saved to ‘evaluate/CRFEntityExtractor_report.json’. 2019-08-05 13:19:15 INFO rasa.nlu.test - Evaluation for entity extractor: CRFEntityServer /home/local/MGM/hschroeder/.virtualenvs/A12Bot/lib/python3.6/site-packages/sklearn/metrics/classification.py:1143: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. ‘precision’, ‘predicted’, average, warn_for) /home/local/MGM/hschroeder/.virtualenvs/A12Bot/lib/python3.6/site-packages/sklearn/metrics/classification.py:1143: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. ‘precision’, ‘predicted’, average, warn_for) /home/local/MGM/hschroeder/.virtualenvs/A12Bot/lib/python3.6/site-packages/sklearn/metrics/classification.py:1143: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples. ‘precision’, ‘predicted’, average, warn_for) 2019-08-05 13:19:15 INFO rasa.nlu.test - Classification report for ‘CRFEntityServer’ saved to ‘evaluate/CRFEntityServer_report.json’.`

but nothing but the report output files are actually generated. And there are no error messages indicating something went wrong while trying to generate these files. Am I missing something?

Ok, got a little closer to the solution at least. Seems like the confusion matrix and everything else is just meant for intent classifaction, but not for entity extraction.

Any work going in this direction?

Hi there!

The flags --errors --histogram --confmat are all meant to be set to file names, not folders. In fact you don’t need to specify a name if you pass those flags by themselves. Give it a try :slight_smile:

Hello,

I think I got the same problem. Only the file CRFEntityExtractor_report.json is being created.

@MetcalfeTom When passing the flags by themselves, I get one of the following:
rasa test nlu: error: argument --errors: expected one argument
rasa test nlu: error: argument --histogram: expected one argument
rasa test nlu: error: argument --confmat: expected one argument
depending on which one is first in line.

Further information: at the moment I have one intent and 4 entities.

I think @IgNoRaNt23 is right and these aren’t available for entity extraction but only for intent classification?

Is there a way to get this working for entity extraction or is there something else like a confusion matrix to see the entities which are being confused with one another?

Afaik not. My team is currently working on a solution for the error.json for entities, that I expect at any moment. We already talked about creating a PullRequest for rasa in the future, but this might take a while if it happens at all.

Hey @linhe, @IgNoRaNt23, success/failure reporting for NER was recently merged into master and will release with rasa 1.3. Feel free to check it out :slight_smile: Report successful and incorrect predictions of NER by tabergma · Pull Request #4335 · RasaHQ/rasa · GitHub