Rasa Core End-To-End Evaluation

Hi guys,

I am trying to perform end-to-end evaluation using rasa_core.evaluate. The following command I am running is this:

python -m rasa_core.evaluate default --core models/relocation/dialogue --nlu models/relocation/nlu --stories test/e2e/e2e_relocation_stories.md --endpoints endpoints.yml --e2e

I’ve made sure e2e_relocation_stories.md contains stories in an end-to-end format. For example:

## end-to-end story 3
* hello: hi
   - utter_greet
* non_resident_relocation: moving to Malaysia.
  - action_determine_if_restricted
  - slot{"is_h3" : false}
  - utter_confirm_GPE
  -  utter_is_account_personal_or_non_personal
* non_personal_account: non-personal
  - action_set_email_type
  - slot{"email_type" : "non_personal"}
  - utter_escalate_to_compliance
  - action_send_email_to_compliance
  - utter_goodbye

I’ve uploaded the exception I got in a log file instead.

Hey @alexf388, I appreciate your not wanting to muck up the screen with the stack trace, but unfortunately I can’t see your log file. Could you try to upload it again or just post the trace? Then I can help you out.

Hi @erohmensing i am using end to end testing framework for testing the bot. The problem is although it evaluates stories on the test data provided, it however after evaluating the stories on test data it defaults back to training NLU data and evaluate the same and in the results only output from the training data is exported for the intent and entity evaluation. I am not sure if it is intended behavior or if it shall be intended behaviour

@abhi_bh_nlp Sounds like you’re running rasa test which runs separate core and NLU evaluations. if you only want to run the e2e ones, just do

rasa test core --e2e


@erohmensing No, i am running the e2e testing, i have 11 samples in test data, so the core evaluation is correct and only test data is considered, however intent evaluation shows support as 1954, which is more like my training data. Below is my code
rasa test --stories tests/validation_data/e2e_stories.md --e2e --out results/e2e

@erohmensing Also, I realized that while testing the NLU model using the given framework, the nlu threshold is never considered as of now. Basically I got bunch of predictions in the prediction probability range of 10-20 % which is way below my current threshold of 40%.

So, although these predictions are correct but I believe we shall be able to get these examples as well in the intent report. What are your thoughts on this?

Re: your first comment, as I mentioned, rasa test runs 2 separate tests, an NLU test and a core test. The e2e test is a type of core test. If you only want to run that one, you should run

rasa test core --stories tests/validation_data/e2e_stories.md --e2e --out results/e2e

The intent report does not take the e2e stories into account, as it is the same as running rasa test nlu after running the stories command above.