Help with e2e evalation

rakesh · January 4, 2019, 7:37am

How can I reduce failed stories when doing e2e evaluation of the model. kindly help me to improve my model. i.e when I use this command python -m rasa_core.evaluate --core models/dialogue --stories test_stories.md -o results, it gives me some wrongly predicted actions. I want to reduce the number of incorrectly predicted actions. Aslo help me how can I evaluate NLU ie intent recognision same as like test_nlu.md to evaluate nlu model.

MetcalfeTom · January 15, 2019, 7:52am

Hey @rakesh,

In general if you want to reduce the number of incorrectly predicted actions, you can add the failed story to your training data with the correct action and your bot’s policies will learn to get it right the next time.

rasa_nlu also has some nice tools for evaluating your models. This script will plot a confusion matrix, f1 scores and a pretty graph showing the confidence for correctly and incorrectly predicted labels on your test set.

rakesh · January 15, 2019, 12:42pm

Hey @MetcalfeTom

Thanks for the reply. Currently I am using 150 story stories.md file of nearly 150 intents like:

story 1

greet
- utter_greet_help

story 2

mood_unhappy
- utter_Help_to_coach

story 3

mood_deny
- utter_Help_coach

to 150 intents

SO i am getting some wrongly predicted ones in confusion matrix itself. When I evaluate model with e2e_stories.md file with one big story containing every intent. like:

stories all

greet
- utter_greet_help
some_random_intent
- its_utterance_action

so on to 150 intents

I think it should predict correct action but i am getting some wrongly predicted actions for some intents. theoritically it should give every prediction correct. But i dont know why.

MetcalfeTom · January 18, 2019, 3:45pm

This is because your evaulation story is different to the data the bot was trained on; in your training data there is no context for each intent, but in the evaluation story the tracker will be storing the previously received messages and executed actions.

I suggest you change your policy_config to include AugmentedMemoizationPolicy instead of the MemoizationPolicy - this will wipe the tracker upon receiving the intent and predict the corresponding utterance in your training data

rakesh · January 21, 2019, 11:34am

@MetcalfeTom Thank you for the reply. I will deffinitely Try that one.

Topic		Replies	Views
Evaluate stories against a model not working as expected Rasa Open Source	4	1174	October 4, 2019
Rasa core evaluation metrics Rasa Open Source	16	2081	July 24, 2019
Rasa Core End-To-End Evaluation Rasa Open Source	6	2571	March 10, 2020
How to handle failed_stories in Evaluating core models Rasa Open Source	3	1195	November 8, 2019
Wrong intent, right action following the stories Rasa Open Source	2	1654	December 12, 2018

Help with e2e evalation

story 1

story 2

story 3

stories all

Related topics