How can I reduce failed stories when doing e2e evaluation of the model. kindly help me to improve my model. i.e when I use this command python -m rasa_core.evaluate --core models/dialogue --stories test_stories.md -o results, it gives me some wrongly predicted actions. I want to reduce the number of incorrectly predicted actions. Aslo help me how can I evaluate NLU ie intent recognision same as like test_nlu.md to evaluate nlu model.
Hey @rakesh,
In general if you want to reduce the number of incorrectly predicted actions, you can add the failed story to your training data with the correct action and your bot’s policies will learn to get it right the next time.
rasa_nlu
also has some nice tools for evaluating your models. This script will plot a confusion matrix, f1 scores and a pretty graph showing the confidence for correctly and incorrectly predicted labels on your test set.
Hey @MetcalfeTom
Thanks for the reply. Currently I am using 150 story stories.md file of nearly 150 intents like:
story 1
- greet
- utter_greet_help
story 2
- mood_unhappy
- utter_Help_to_coach
story 3
- mood_deny
- utter_Help_coach
to 150 intents
SO i am getting some wrongly predicted ones in confusion matrix itself. When I evaluate model with e2e_stories.md file with one big story containing every intent. like:
stories all
- greet
- utter_greet_help
- some_random_intent
- its_utterance_action
so on to 150 intents
I think it should predict correct action but i am getting some wrongly predicted actions for some intents. theoritically it should give every prediction correct. But i dont know why.
This is because your evaulation story is different to the data the bot was trained on; in your training data there is no context for each intent, but in the evaluation story the tracker will be storing the previously received messages and executed actions.
I suggest you change your policy_config
to include AugmentedMemoizationPolicy
instead of the MemoizationPolicy
- this will wipe the tracker upon receiving the intent and predict the corresponding utterance in your training data
@MetcalfeTom Thank you for the reply. I will deffinitely Try that one.