I really like the idea of the two stage fallback, however, if you have a lot of intents that are often only differing in the depth of the details.
We have a lot of customer processes for which we want to be able to allow our customers jump right into instead of starting with generic stories only, so we have quite a bunch of similar intents. Of course that leads to situations in which the distinction between the intents is hard, especially if users hit phrases that are combined from multiple similar intent examples.
I found out that creating “disambiguation intents” does not make the situation better, as the NLU data gets just more data which is quite alike.
Instead, I thought about creating an intent naming scheme like maintopic_topic_detail and create a pipeline component that grabs the recognized intents (comparable to the NluFallbackClassifier) and checks if the distance between the top 2 intents is below threshold.
But instead of creating a generic nlu_fallback intent, I check the list of recognized intents and try to find a common (or majority vote) root intent, say 3/4 of the recognized intents start with the same maintopic_ - in that case I can issue an intent of maintopic_fallback to ask the customer if the topic was correctly recognized and start asking detail questions from there.
This approach however has issues with using SoftMax on the confidences, because that makes the measurement of distances between the utterance and the intents even harder.
Do you, @koaning , see any major issues with that approach? I am going to experiment with triplet-loss once spare time allows, to see if I can get good thresholds for both ambiguity and minimum confidence.
I’ve been contemplating making a custom action for this use-case, but my solution lies a bit more in the realm of the user interface than in the realm of ML.
I have a demo of this idea on Github (but it’s been a while since I touched that repo). The idea is to use the fallback event to trigger a custom action that uses the NLU results to generate appropriate buttons for the user. Here’s a screenshot:
By generating these buttons, the user can quickly select the right option if it is there, but one could also add a button for “human-handoff” here. What’s nice here is that while these buttons are presented the user can also still type and issue another command to the assistant.
How you generate these buttons, is totally up to you. You can set your own thresholds on the predicted intent/response scores.
I thought about that, too, for a while.
I think the button offering is good if the confidence is too low, altogether. Or if the top two intents are close, but the remaining intents are far off. In our case we might find the top 5 intents being close to each other, as we have intents for, say:
I want a new e-mail address
I want a new e-mail alias
I want to change my e-mail address
I want to set an e-mail client
Those are pseudo-intents here with a lot of NLU examples each.
The issue I see is that utterances like “(garbled) … e-mail … (garbled)” currently either lead to “I didn’t understand you (at all)” or to something too specific “Did you mean change e-mail alias?” even though the only thing we know for sure is that it was something about e-mail.
So by comparing the NLU results and, in this case, hopefully seeing a lot of e-mail intents being in the top list, I can create an (artificial, no NLU examples) e-mail-offer-help-intent (of course that intent can also have NU examples if they do not weaken the detail intents above).
If that intent itself triggers a button or if the stories continue otherwise is a different question IMHO.
One point is a need a good distance measure (confidence) between utterance and the learnt intents, I saw that default with SoftMax gives the top intent usually too much of a boost. If I find time, hopefully later that week, I want to look into TripletLoss branch.
A good distance measure between utterance and intent is a somewhat unsolved problem. We’re doing active work in this field, but it remains a topic of trial and error. You may enjoy this algorithm whiteboard video that talks a bit about it.