I have the following two stories, but am finding that Rasa-Core doesn’t seem very reliable in correctly picking the correct one. The choice of which story to choose is based on the presence (or absence) of the slot reservation, which is a text slot.
I’m finding that most of the time I get the correct path, but sometimes Rasa is choosing the incorrect one - ie taking me down the “with reservation” path, even though reservation slot is set to None. The difference seems to be what other questions/dialog might have been processed prior to doing checkin. Now I understand that prior dialog will influence the future route - but surely it shouldn’t be predicting the “with reservation” path when the reservation slot is set to None and there’s a clear story which better fits that scenario?
Any thoughts as to how I might address this, of if indeed there might be some underlying issue in story prediction?
Did you try the hyperparameters? keep in mind, it is still a neural network or a classifier just like any machine learning algorithm, so the featurization is very important. So is evaluation
Maybe check the max_history parameter
also do an evaluation of your test dataset to see at which point, that issue occurs. Which actions are mostly likely confused Another way to detect such issues is through online learning
max_history is 3. I tried increasing augmentation to 50, which seems to help improve matters, but training then takes an age (like, 1/2 hour). Online training is great, but there are so many possible combinations of dialog turns - it’s not possible to test all combinations of everything that might be said before checkin.
It feels that the slot condition is only one of several features influencing the next dialog turn, when it really ought to have a much stronger influence. ie if the slot condition doesn’t match, this story should get a much, much lower probability. In reality, it seems you can get a very high probability even though the slot condition doesn’t match - that doesn’t seem right to me.
They are featurized with equal weight with values of 0 or 1 if not present or present meaning based on max history parameter each state featurizer is a bunch of 0s or 1s
If intent X is present then 1
If prev action was Y then 1, for rest actions it is 0
If slot of type text is not present then 0 else 1
All this flattened are the input feature of a single state, if your max history is 3, you will have a multidimensional array with feature vectors for each state
If you want to make sure slots are the only feature that determines the course of the conversation along with intent, you have to update the featurizer yourself and remove entities to be featurized.
With a lot more training data along with augmentation, you can eventually determine a good model. This is something I am also working at my end for a complex conversation to determine the trade off between writing logic vs defining stories
Hmm… interesting. So adding a slot condition is really only a small hint in a bigger sphere.
I’m not sure this is really how it should be though - it feels that the slot condition should have a much stronger influence on the path taken. To find a dialog going down a path whose conditions are diametrically opposed to the tracker status, when there’s one that matches the tracker status, just does not feel right. Trying to train in this situation is quite tricky - and there’s a latent worry that the dialog might do weird things if the user enters an unexpected set of responses.
I wonder if the algorithms can be tweaked to add greater weight to slot conditions? For me the thing that seems to be causing randomness in the conversation is the prior history of the conversation - if slot conditions held greater weight than dialog history, I’m guessing that might resolve things.
The prior history is determined using max history.
Your dialog is generally varying based on slots however some dialogues may vary depending on what the user intend to say.
Slots are present irrespective of history hence you can technically reduce your max history and then maybe slots will have a greater influence on predicting next action
Your target however is to minimise surprises that can appear in a conversation that means it is a continuous learning process for your bot until you reach the saturation level of your conversations
So I set --history to 0 and initial testing seems that it’s now behaving how I want it to. I had, erroneously, thought that --history 0 would mean the dialog wouldn’t keep track of itself within a story. But it seems to work fine - and without dialog history influencing the next turn of the conversation, I’m getting the decisions based purely on slot and intent. Nice!
That does, however, leave me wondering what I’m missing out on by ignoring history! It seems to work though - maybe it’s how I wrote my stories.
Anyway, it’s nice to see an active forum and help here!
If you are using slots to really drive your conversations, past history won’t matter too much, however there could be cases like the fallback
let’s say
user said something you didn’t understand, he said the second time and you didn’t understand so the third time you might route him to a live agent.
Well, more testing shows I am still getting dialogues going down the wrong path.
I’m not sure what could be influencing this. History is set to 0. My slots aren’t changing. There are not Entities being detected. I get a v.high probability hit on the correct intent. I have two story paths with opposing conditions - and sometimes I’m taken down a path whose condition doesn’t match.
I also go back to my original point. To the dialogue designer, putting a slot condition in a story should have a bigger influence on the path is taken - and it seems not to have the level of influence I would expect. Something isn’t right with the featurisation and prediction here - it shouldn’t take this amount of effort to persuade Rasa-Core to do what I need.
I feel for an RNN to work well, it probably needs a lot more data, however i am not entirely convinced of using an RNN if you have really simple conversations, a decision tree clf would work as well based on flattened stories, i am trying to see if that works better or not.
However RNN is super useful if there is a long history in the conversation and it could potentially perform well over a large corpora of conversation
Here’s an example on the amount of stories that is used as training data
I’ve had similar experiences training a Rasa dialog model. As we all should know, it’s impossible to learn without bias (No Free Lunch Theorem). The trick is selecting the model with the right bias for the job at hand. In Rasa’s case, I don’t think it is biased in the right way yet, and/or we don’t have the right knobs to adjust its bias the way we need for our particular use-cases. When someone has a very simple use-case, there should be a simpler way to influence the ML to learn the right path using fewer training examples. One idea is to assemble a set of pre-trained models that sample the use-case-space, and then allow each Rasa-user to do transfer learning with their own data set after they select the pre-trained model with the best general tendencies. Another idea is to integrate hard constraints in a smart way. There has been research going on in the ML community for a decade or more into to how combine hard logical constraints with machine learning so we get the best of both worlds.
Coincidentally, I ran into this paper from Amazon today, where they indicate one way of providing a learning bias in Alexa Skills Kit and Lex:
It imposes inductive biases that allow it to learn robust SLU models from extremely small and sparse datasets and, in doing so, removes significant barriers to entry for software developers and dialogue systems researchers.
I’ve run into the exact same issue recently. and I also think slots should have higher influence on the prediction (or at least possible to tweak the influence)
Can we not tweak the model architecture in a way where it is able to dynamically attend to set slots and assign larger weights as opposed to other features at a given instance?
Also, One observation i have from running an experiment was, the model is able to generalize to different
categorical slot values in a smaller domain (total intents + entities + slots in your bot), but when your domain(total intents + entities + slots in your bot) gets bigger, that influence on the prediction as well as generalization specifically for slots start decaying.
Can i get more perspectives on this? What do you think of this problem and how can this possibly be solved?
I’m not sure what you mean, but you can try to use embedding policy: Policies indeed of keras. Pay attention to the warning about max_history in the docs