Help with e2e evalation

How can I reduce failed stories when doing e2e evaluation of the model. kindly help me to improve my model. i.e when I use this command python -m rasa_core.evaluate --core models/dialogue --stories test_stories.md -o results, it gives me some wrongly predicted actions. I want to reduce the number of incorrectly predicted actions. Aslo help me how can I evaluate NLU ie intent recognision same as like test_nlu.md to evaluate nlu model.

Hey @rakesh,

In general if you want to reduce the number of incorrectly predicted actions, you can add the failed story to your training data with the correct action and your bot’s policies will learn to get it right the next time.

rasa_nlu also has some nice tools for evaluating your models. This script will plot a confusion matrix, f1 scores and a pretty graph showing the confidence for correctly and incorrectly predicted labels on your test set.

Hey @MetcalfeTom

Thanks for the reply. Currently I am using 150 story stories.md file of nearly 150 intents like:

story 1

  • greet
    • utter_greet_help

story 2

  • mood_unhappy
    • utter_Help_to_coach

story 3

  • mood_deny
    • utter_Help_coach

to 150 intents

SO i am getting some wrongly predicted ones in confusion matrix itself. When I evaluate model with e2e_stories.md file with one big story containing every intent. like:

stories all

  • greet
    • utter_greet_help
  • some_random_intent
    • its_utterance_action

so on to 150 intents

I think it should predict correct action but i am getting some wrongly predicted actions for some intents. theoritically it should give every prediction correct. But i dont know why.

This is because your evaulation story is different to the data the bot was trained on; in your training data there is no context for each intent, but in the evaluation story the tracker will be storing the previously received messages and executed actions.

I suggest you change your policy_config to include AugmentedMemoizationPolicy instead of the MemoizationPolicy - this will wipe the tracker upon receiving the intent and predict the corresponding utterance in your training data

@MetcalfeTom Thank you for the reply. I will deffinitely Try that one.