Rasa core evaluation metrics

Hello RASA Community, Hope everyone is doing fine, I have a query regarding the results of the rasa core evaluation results, the query is while training the dialogue model with the keras policy it gives certain number of failed stories and with sklearn policy it gives certain failed stories but still the performance is very poor for both the cases, I know this is mostly related to the policy algorithm which I am using but I want the suggestion and any extra idea like where more I can look into and exploit so as to improve the quality of the core model in terms of rightly predicting actions and reducing the number of failed stories? I am using around 200 intents and 300 entities. Please suggest, thanks in advance.

Regards, Abhi

1 Like

Hi there @abhi2652254! You mention using the KerasPolicy and the SklearnPolicy – do you use other policies too? The best way to improve your performance is writing good stories and then using the MemoisationPolicy to get your model to memorize these stories. The machine learning policies are meant to step in when it hasn’t seen a certain path before.

Can you attach what your policy configuration looks like?

Also:

I know this is mostly related to the policy algorithm which I am using

Do you mean configuration here? If not, what do you mean?

1 Like

Hi @erohmensing, sorry to jump thread but what do you mean by writing good stories. If let’s say my chat only allows authorised user to use, this means every story would requires me to have a categorical slot call valid user like the story below right?

## Valid users booking tickets
* greet
- action_valid_user
- slot{"valid_user": "Yes"}
- bookingForm
- action_restart

## valid users viewing tickets
* greet
- action_valid_user
- slot{"valid_user": "Yes"}
- ViewingForm
- action_restart

Is this counted as a good story or bad story since the stories are very similar at the start but only different at the bottom.

Also, to improve performance to make the bot more accurate is to use MemoisationPolicy? In this story above, what value of max_history is needed? And does it mean I don’t have to use the machine learning policies and I can straightly rely on MemoisationPolicy?

Thanks!

Hi @enzechua! No problem. I think it’s a great idea for there to be a featurized slot for valid user that starts your stories. For example, in our demobot, we have a similar thing where each story starts with a slot that says our user has seen the privacy policy.

However, there’s still something wrong with these stories – there is nothing in the story that tells the bot whether to go to the booking form or the viewing form – the contexts of the conversation are exactly the same. What drives the difference in the conversations – is it for example the page the user is opening the chat in? If so, you want to create a featurized slot for that to tell the bot why one form is being used over the other. Or maybe there is more user input in your story where the user says which action (view or book) they want to perform. If you explain your use case a bit more I’d be happy to help you clean these up. :slight_smile:

We always recommend to keep the machine learning policy in the ensemble so that it can handle cases you’ve never seen before. However yes, the idea behavior has it implementing the memorized steps as much as possible!

Hahaha, now I know that categorical slots is known as featurized slot. Oh my bad! I shown a bad story example. Below would be the actual story!

## Valid users booking tickets
* greet
- action_verify_user
- slot{"valid_user": "Yes"}
- introduction_form
- slot{"command": "booking"}
- BookingForm
- action_restart

# Valid users viewing tickets
* greet
- action_valid_user
- slot{"valid_user": "Yes"}
- introduction_form
- slot{"command": "view"}
- ViewingForm
- action_restart

## Valid users doing something else etc.

So base on this example if I have 4 features that a valid user can do , it means that I always have to begin my story from greet -> action_verify_user -> set a categorical slot ?

Ah, yes I tried running without KerasPolicy and all my stories did not work. I have a few question though.

  1. For MemoizationPolicy, depending on the max history, ex 1. I assume that it will run the story first and memorized at least 1actions. This means that if the action occurs again, it will confidently predict it 100% is it? I can’t really visualize it, hope you can explain it to me !

  2. How does Rasa Core predict which story to start first and follow through? This is something I am trying to comprehend. Is that when KerasPolicy comes in?

  3. I am reading a lot about all the policies but still find it very hard to grasp! Would it be possible for me to ask you directly haha !

Regards!

  1. The memoization policy looks at the last <max_history> events to decide what to do next. so in this case, I believe a max history of 1 would be sufficient to map command:booking to booking form and the same for viewing. However if you had another slot or action in between the slot event and the form action, you would have to increase the max history.

  2. It just looks at what’s happened, so it’s not following a patricular story at and one time but is looking at the history of the conversation and seeing what to do next. So e.g. in this case if you are only at action_verify_user, it’s not following a particular story yet because the stories haven’t branched. but once you get to - slot{"command": "booking"}, it knows that it shouldn’t follow # Valid users viewing tickets because the context doesn’t match anymore.

  3. Yep, keep asking any questions you have! We are looking to increase the amount of informationi n the documentation about how the policies themselves and the ensemble work i, so I don’t mind.

Have you also tried using the evaluation script for core? rasa test core – this will output failed stories where your bot is predicting something other than what is in the stories you give it. Ideally you would give it test stories, but it’s important to make sure that it gets no failed stories on your training data because if it’s failing on your training data, there’s definitely something wrong!

  1. Nice so that is how you test <max_history> cause currently I have no memoization action found probably maybe the max history isn’t set well.

  2. OHHHH, I see so it sees all at once but is this the function of core or the policies?

  3. Oh thank you very much haha. Yes the policies seems to be the hardest one to understand. The docs are good for simple stories but once the stories gets branch out to be more complicated, things don’t go the way it is and that’s how I started reading up on all the other stuffs. haha.

  4. So it seems like memoizationPolicy don’t kick in since no next action is found. Which means that the the core relies totally on KerasPolicy. However sometimes it seems to be inaccurate. Sometimes it just jump to different path. Am very new to Keras, was reading about how batch size, epochs affects the accuracy. One thing that puzzled me is I tried with augmentation 20 running epoch at 100, the accuracy was 98% however when I ran the server, I did not go to the right intended path. But with augmentation 20, running epoch at 500, the accuracy was close to 100% when I ran the server, it gave me the right path. Why would this happen?

Yes! I tried it, it say all my stories passed. Weird thing was that I tried with epoch 10, validation_split at 30%, accuracy was around 30% and I evaluated my core model. It shows a 1.00 for everything. When I tried with epoch 100, validation_split 30%, accuracy was near to 100% and I evaluated the core model, it also show a 1.00 for everything. Am I suppose to evaluate it this way? Thanks!

  1. Yeah the max history just depends how your stories are written really. It’s good to play around with it. Keep in mind that higher ones will make your trianing take longer though.

  2. Kind of both? Idk at that point (action prediction based on context) i would kind of say the acting part of core is the policies haha

  3. They’re easiest to try to get into one at a time, since each one predicts something different based on certain rules or models :smiley:

Are you sure your memoization policy hasn’t been kicking in correctly? accuracy was around 30% and I evaluated my core model. It shows a 1.00 for everything. sounds exactly like your machine learning policy wasn’t kicking in – even though it wasn’t trained long enough to have a high accuracy, correct actions were being predicted (so probably IMO through memoization) – but as long as you are getting 1.00 that’s great! How many stories do you have?

  1. Alright will do thanks !

  2. Hahaha, good to know that, I guess more research has to be done from my end!

  3. I guess it’s only the Keras policy and MemoizationPolicy that is giving me the hardest time here. If I can find the sweet spot that suits my case I guess it be great already!

After changing max history to 1 I can see that it says there is a next action at this point. Curious though, after calling action_restart, the conversation would be gone right? Does it mean the memorisationPolicy would be restarted as well in that sense? Hmm I did rasa test core -m models/ after I trained the model but I guess from what you say, it seems that the ML policy did not do any effect to it. I have 16 stories only, does augmentation counts?

Thanks for following up this conversation, learning a lot more everyday! Although my school project ends this week but would be nice to learn more about Rasa!

Indeed, action restart emits a Restarted() event that basically says “only pay attention to what comes after this” – this applies to all of your policies. So even if your max history is 3, after restarting, it’ll only make the decision based on the initial input again.

The way you can test actually is by running the core evaluation script and just taking the machine learning policy temporarily out of your ensemble, and then see if everything still passes! :slight_smile:

Hey sorry for the late reply again, went for a run ! So are you saying that using reset all slots would be better cause it doesn’t restart the whole conversation?

The command for it would be rasa test core -m models/ --out results right? Yes I took out the KerasPolicy and it seems like what you say, everything still passes and getting a 1.00 for f1-score. So this means that the keras policy did not kick in for evaluation? But how come without keras, I am not able to run my story. It just goes straight to action fall back!

Hm, so without keras, your model passes evaluation but doesn’t work when you run it? To be clear, did you train the model without keras before evaluating it?

Yes correct, I took away keras and train it. Then I went to evaluate that trained model (without Keras). It passes the evaluation with a 1.00 for f1 score on everything but when I run it, it straights away gone to the action fall back! Wonder what did I do wrong. hmmm

Hmm. would you mind sharing your stories and your config so i can try it out locally?

Do you want me to post it here or message you somewhere else ! haha

Here if you don’t mind!

## fallback story
* out_of_scope
    - action_default_fallback

## if existing client that is attached to multiple companies chosen to create new ticket
* greet
    - action_verify_user
    - slot{"valid_user": "Yes"}      
    - introduction_form
    - form{"name": "introduction_form"}
    - slot{"command": "create_with_multiple_companies"}
    - form{"name": null}
    - company_menu_form
    - form{"name": "company_menu_form"}
    - form{"name": null}    
    - problem_form
    - form{"name": "problem_form"}
    - form{"name": null}
    - action_reset_and_restart
    - action_goes_to_menu
    
## if existing client that is attached to 1 company chosen to create new ticket
* greet
    - action_verify_user
    - slot{"valid_user": "Yes"}      
    - introduction_form
    - form{"name": "introduction_form"}
    - slot{"command": "create_with_one_company"}   
    - form{"name": null}
    - problem_form
    - form{"name": "problem_form"}
    - form{"name": null}
    - action_reset_and_restart
    - action_goes_to_menu


    
## if existing client chosen to view existing tickets (have existing tickets)
* greet   
   - action_verify_user 
   - slot{"valid_user": "Yes"}      
   - introduction_form
   - form{"name": "introduction_form"}   
   - slot{"command": "view"}
   - slot{"view_ticket_path": "have_tickets"}
   - form{"name": null}
   - view_ticket_form
   - form{"name": "view_ticket_form"}
   - form{"name": null}
   - action_reset_and_restart
   - action_goes_to_menu
    
## if existing client chosen to view existing tickets (no existing tickets)
* greet   
   - action_verify_user 
   - slot{"valid_user": "Yes"}      
   - introduction_form
   - form{"name": "introduction_form"}
   - slot{"command": "view"}
   - slot{"view_ticket_path": "no_tickets"}
   - form{"name": null}   
   - display_no_existing_ticket
   - action_reset_and_restart
   - action_goes_to_menu
     
## if existing client chosen to enter new authorisation code and is successful
* greet   
   - action_verify_user 
   - slot{"valid_user": "Yes"}      
   - introduction_form
   - form{"name": "introduction_form"}
   - form{"name": null}
   - slot{"command": "entercode"}
   - authorisation_form
   - form{"name": "authorisation_form"}  
   - form{"name": null}   
   - slot{"enter_new_code": "success"} 
   - action_validate_user         
   - action_reset_and_restart
   - action_goes_to_menu

## if existing client chosen to enter new authorisation code but code already tied to him/her
* greet   
   - action_verify_user 
   - slot{"valid_user": "Yes"}      
   - introduction_form
   - form{"name": "introduction_form"}
   - slot{"command": "entercode"}
   - form{"name": null}
   - authorisation_form
   - form{"name": "authorisation_form"}
   - form{"name": null}
   - slot{"enter_new_code": "code_exist"} 
   - action_reset_and_restart
   - action_goes_to_menu
    
## if existing client chosen to enter new authorisation code but entered the wrong code
* greet   
   - action_verify_user 
   - slot{"valid_user": "Yes"}      
   - introduction_form
   - form{"name": "introduction_form"}
   - form{"name": null}
   - slot{"command": "entercode"}
   - authorisation_form
   - form{"name": "authorisation_form"}
   - form{"name": null}
   - slot{"enter_new_code": "failed"}   
   - action_reset_and_restart
   - action_goes_to_menu
   
   
## newly registered Client creating new ticket (only attached to 1 company at first)
* greet   
   - action_verify_user
   - slot{"valid_user": "No"}      
   - authorisation_form
   - form{"name": "authorisation_form"}
   - form{"name": null}
   - action_validate_user
   - slot{"valid_user": "Yes"} 
   - introduction_form
   - form{"name": "introduction_form"}
   - slot{"command": "create_with_one_company"}  
   - form{"name": null}
   - problem_form
   - form{"name": "problem_form"}
   - form{"name": null}
   - action_reset_and_restart
   - action_goes_to_menu
   
## newly registered Client viewing existing ticket (have existing tickets)
* greet   
   - action_verify_user
   - slot{"valid_user": "No"}      
   - authorisation_form
   - form{"name": "authorisation_form"}
   - form{"name": null}
   - action_validate_user
   - slot{"valid_user": "Yes"} 
   - introduction_form
   - form{"name": "introduction_form"}
   - slot{"command": "view"}
   - slot{"view_ticket_path": "have_tickets"}
   - form{"name": null}
   - view_ticket_form
   - form{"name": "view_ticket_form"}
   - form{"name": null}
   - action_reset_and_restart
   - action_goes_to_menu
  
  ## newly registered Client viewing existing ticket (no existing tickets)
* greet   
   - action_verify_user
   - slot{"valid_user": "No"}      
   - authorisation_form
   - form{"name": "authorisation_form"}
   - form{"name": null}
   - action_validate_user
   - slot{"valid_user": "Yes"} 
   - introduction_form
   - form{"name": "introduction_form"}
   - slot{"command": "view"}
   - slot{"view_ticket_path": "no_tickets"}
   - form{"name": null}
   - display_no_existing_ticket
   - action_reset_and_restart
   - action_goes_to_menu
   
## New Client with invalid authorisation code
* greet   
   - action_verify_user
   - slot{"valid_user": "No"}      
   - authorisation_form
   - form{"name": "authorisation_form"}
   - form{"name": null}
   - action_validate_user
   - slot{"valid_user" : "No"}
   - action_blacklist_scheduler  

## Blacklisted users
* greet   
   - action_verify_user
   - slot{"valid_user": "blacklisted"}
 
 ## remove
 - action_blacklist_removal
 - slot{"valid_user": null}
 
 ## scheduler for inactivity 
 - action_inactivity_scheduler
 - action_deactivate_form
 - form{"name" : null }
 * deny
 - utter_goodbye
 - action_reset_and_restart
 
  ## scheduler for agree
 - action_inactivity_scheduler
 - action_deactivate_form
 - form{"name" : null }
 * affirm
 - problem_form
 - form{"name": "problem_form"}
 - form{"name": null}
## Config
language: "en"

pipeline: "pretrained_embeddings_spacy"

policies:
  - name: KerasPolicy
    epochs: 500
    batch_size: 32
    max_history: 5
    validation_split: 0.3
  - name: FallbackPolicy
    fallback_action_name: 'action_default_fallback'
    nlu_threshold: 0.3
    core_threshold: 0.3
  - name: MemoizationPolicy
    max_history: 1
  - name: FormPolicy
  - name: MappingPolicy

But if it is the stories only then here you go!