Core prediction on stories not consistent with stories and slot status

I have the following two stories, but am finding that Rasa-Core doesn’t seem very reliable in correctly picking the correct one. The choice of which story to choose is based on the presence (or absence) of the slot reservation, which is a text slot.

I’m finding that most of the time I get the correct path, but sometimes Rasa is choosing the incorrect one - ie taking me down the “with reservation” path, even though reservation slot is set to None. The difference seems to be what other questions/dialog might have been processed prior to doing checkin. Now I understand that prior dialog will influence the future route - but surely it shouldn’t be predicting the “with reservation” path when the reservation slot is set to None and there’s a clear story which better fits that scenario?

Any thoughts as to how I might address this, of if indeed there might be some underlying issue in story prediction?

no reservation

  • checkin
    • slot{“reservation”: null}
    • utter_checkin_without_reservation

with reservation

  • checkin
    • slot{“reservation”: “_”}
    • utter_checkin_with_reservation

Did you try the hyperparameters? keep in mind, it is still a neural network or a classifier just like any machine learning algorithm, so the featurization is very important. So is evaluation Maybe check the max_history parameter

also do an evaluation of your test dataset to see at which point, that issue occurs. Which actions are mostly likely confused Another way to detect such issues is through online learning

max_history is 3. I tried increasing augmentation to 50, which seems to help improve matters, but training then takes an age (like, 1/2 hour). Online training is great, but there are so many possible combinations of dialog turns - it’s not possible to test all combinations of everything that might be said before checkin.

It feels that the slot condition is only one of several features influencing the next dialog turn, when it really ought to have a much stronger influence. ie if the slot condition doesn’t match, this story should get a much, much lower probability. In reality, it seems you can get a very high probability even though the slot condition doesn’t match - that doesn’t seem right to me.

They are featurized with equal weight with values of 0 or 1 if not present or present meaning based on max history parameter each state featurizer is a bunch of 0s or 1s

If intent X is present then 1

If prev action was Y then 1, for rest actions it is 0

If slot of type text is not present then 0 else 1

All this flattened are the input feature of a single state, if your max history is 3, you will have a multidimensional array with feature vectors for each state

If you want to make sure slots are the only feature that determines the course of the conversation along with intent, you have to update the featurizer yourself and remove entities to be featurized.

With a lot more training data along with augmentation, you can eventually determine a good model. This is something I am also working at my end for a complex conversation to determine the trade off between writing logic vs defining stories

Hmm… interesting. So adding a slot condition is really only a small hint in a bigger sphere.

I’m not sure this is really how it should be though - it feels that the slot condition should have a much stronger influence on the path taken. To find a dialog going down a path whose conditions are diametrically opposed to the tracker status, when there’s one that matches the tracker status, just does not feel right. Trying to train in this situation is quite tricky - and there’s a latent worry that the dialog might do weird things if the user enters an unexpected set of responses.

I wonder if the algorithms can be tweaked to add greater weight to slot conditions? For me the thing that seems to be causing randomness in the conversation is the prior history of the conversation - if slot conditions held greater weight than dialog history, I’m guessing that might resolve things.

The prior history is determined using max history. Your dialog is generally varying based on slots however some dialogues may vary depending on what the user intend to say. Slots are present irrespective of history hence you can technically reduce your max history and then maybe slots will have a greater influence on predicting next action

Your target however is to minimise surprises that can appear in a conversation that means it is a continuous learning process for your bot until you reach the saturation level of your conversations

So I set --history to 0 and initial testing seems that it’s now behaving how I want it to. I had, erroneously, thought that --history 0 would mean the dialog wouldn’t keep track of itself within a story. But it seems to work fine - and without dialog history influencing the next turn of the conversation, I’m getting the decisions based purely on slot and intent. Nice!

That does, however, leave me wondering what I’m missing out on by ignoring history! It seems to work though - maybe it’s how I wrote my stories.

Anyway, it’s nice to see an active forum and help here!

If you are using slots to really drive your conversations, past history won’t matter too much, however there could be cases like the fallback

let’s say user said something you didn’t understand, he said the second time and you didn’t understand so the third time you might route him to a live agent.

Well, more testing shows I am still getting dialogues going down the wrong path.

I’m not sure what could be influencing this. History is set to 0. My slots aren’t changing. There are not Entities being detected. I get a v.high probability hit on the correct intent. I have two story paths with opposing conditions - and sometimes I’m taken down a path whose condition doesn’t match.

I also go back to my original point. To the dialogue designer, putting a slot condition in a story should have a bigger influence on the path is taken - and it seems not to have the level of influence I would expect. Something isn’t right with the featurisation and prediction here - it shouldn’t take this amount of effort to persuade Rasa-Core to do what I need.

can you try this? also what type of slot is your reservation slot? text?

Similar results - it’s still possible to get sent down the wrong path.

that’s strange, do you only have two examples for this?

what was the type of your slot?

I feel for an RNN to work well, it probably needs a lot more data, however i am not entirely convinced of using an RNN if you have really simple conversations, a decision tree clf would work as well based on flattened stories, i am trying to see if that works better or not.

However RNN is super useful if there is a long history in the conversation and it could potentially perform well over a large corpora of conversation

Here’s an example on the amount of stories that is used as training data

@duncsand sorry for the late reply on this – as souvik asked, what type is your slot? And what do the rest of your stories look like?

I’ve had similar experiences training a Rasa dialog model. As we all should know, it’s impossible to learn without bias (No Free Lunch Theorem). The trick is selecting the model with the right bias for the job at hand. In Rasa’s case, I don’t think it is biased in the right way yet, and/or we don’t have the right knobs to adjust its bias the way we need for our particular use-cases. When someone has a very simple use-case, there should be a simpler way to influence the ML to learn the right path using fewer training examples. One idea is to assemble a set of pre-trained models that sample the use-case-space, and then allow each Rasa-user to do transfer learning with their own data set after they select the pre-trained model with the best general tendencies. Another idea is to integrate hard constraints in a smart way. There has been research going on in the ML community for a decade or more into to how combine hard logical constraints with machine learning so we get the best of both worlds.

Coincidentally, I ran into this paper from Amazon today, where they indicate one way of providing a learning bias in Alexa Skills Kit and Lex:

It imposes inductive biases that allow it to learn robust SLU models from extremely small and sparse datasets and, in doing so, removes significant barriers to entry for software developers and dialogue systems researchers.

I’ve run into the exact same issue recently. and I also think slots should have higher influence on the prediction (or at least possible to tweak the influence)

Can we not tweak the model architecture in a way where it is able to dynamically attend to set slots and assign larger weights as opposed to other features at a given instance?

Also, One observation i have from running an experiment was, the model is able to generalize to different categorical slot values in a smaller domain (total intents + entities + slots in your bot), but when your domain(total intents + entities + slots in your bot) gets bigger, that influence on the prediction as well as generalization specifically for slots start decaying.

Can i get more perspectives on this? What do you think of this problem and how can this possibly be solved?

@duncsand @souvikg10 @akelad

what policies are you using?

AugmentedMemoization, Keras and Form Policies

I’m not sure what you mean, but you can try to use embedding policy: Policies indeed of keras. Pay attention to the warning about max_history in the docs