Intent recognition after button utterances

As we are in the process of building a very large conversational bot application, we are facing more and more issues with overlapping intents (same for entities, but I’ll cover that in another thread), and also intents that are contextually the same meaning – or different in other contexts. Officially the solution is to create a lot of stories, so memoization and TED can learn the stories to deal with the intents the users might voice.

However, as shown in the TwoStageFallback, a loop action also can be somewhat context aware. Sadly, LoopAction is not exposed to the SDK, afaik.

Right now, I have tackled the problem by writing a custom policy that modifies the tracker object, based on meta data hidden in the button properties of the utterances. I am wondering if there could be a better way, because now I am creating a mini-state-machine for one turn (well, also the memoization policy is some kind of state machine, too).

It works by assigning additional intents to the buttons, as an extra meta data key “button_intents”. Some intents are automatically created (like “the first”, “the last” as ordinal inform intents for the button position, which you will see a lot in voice channel transscripts), and more are up to dialogue designers to include manually. If such an intent is recognized (e.g. by DIET) in the user utterance after a button bot utterance, the policy will change the recognized intent to the payload of the button. TED, Memoization Policy and other policies are then able to make a good prediction on the next action.

A modified “Sara” fork is available here to examine my proposal: raoulvm/rasa-demo at button_policy (github.com)

Do you, @koaning , see any major drawbacks with that approach?

The README on the domain format:

Hi! Would be great if you can give us some more information on what you’d like to achieve.

Is the idea to allow users to select an option with text rather than actually clicking a button?

For example: button gives the option A or B, user says “I want the second one” instead of clicking B?

Or is the idea to disambiguate between possible intents?

For example: user says something that may have intent C or intent D, and they are shown a button that asks “Did you mean intent C? or intent D?”

Hi Felicia,

that is exactly what we want to achieve. That is most important for channels without buttons (such as voice) but we saw a lot of text responses to button options in real life in web, even if clicking a button seems so much easier. You can see in Sara demo, that often multiple options are offered - and if you choose the first option by uttering “playground” you’re getting nowhere in standard bot.

The “happy path” for the demo is: “how can i start?”

  • bot replies with buttons.

with the button_intents enabled: you can now utter things like: (alternatives!)

  • “the first”
  • “the last”
  • “the middle”
  • “the second”
  • “playground”.

You can also contact Shukri from CSE team who knows about the idea, I was hoping for more feedback also from other customers here.

who is Shukri???

1 Like

Hi Raoul, yes, Shukri’s already given me some information :slight_smile: Just want to make sure I know exactly what you’re trying to do so I can make sure I’m giving you appropriate advice!

I think we might be able to simplify your approach. In general, it is not advisable to directly modify the tracker – this can lead to unintended consequences. As well as this, your policy is not really a policy, but rather a transforming and filtering function for the output of intent classifiers and entity extractors (i.e. the NLU pipeline). I think you can achieve your desired behaviour with a custom action, and of course adjustments to training data and configuration.

The vanilla solution requires that the intent is classified correctly. Yours requires one of:

  1. the intent is classified correctly
  2. an entity is extracted, and can be mapped to a button option
  3. an ordinal mention is made, and can be mapped to a button option

We can handle the latter two with custom actions and some tweaking of the training data. For example, modify the story so rules/story so that the correct action can be predicted based off of a slot value:

- story: choose playground
  steps:
  - intent: enter_data
  - slot_was_set:
    - choice: playground
  - action: utter_playground_intro
  - action: utter_ask_playground_help
  - intent: affirm
  - action: playground_form
  - active_loop: playground_form

You’ll need to add appropriate slots to your domain.

  1. an entity is extracted, and can be mapped to a button option You can either get this “for free” if you enable slot-filling from the entity or you can consider adding the entity on its own to the intent, and you may see that this case is handled in 1.

  2. the user utters an ordinal mention (“the first one”) I think this can and should be handled with a custom action. I’d create a mention slot. You can extract ordinal mentions using DucklingEntityExtractor with the dimension ordinal. I would write a custom action that is triggered anytime an ordinal mention is classified, then first check whether the last bot action included button options (like you do here, however, I would get the last bot utterance by using the method get_last_event_for). You can pass arguments to skip and/or exclude events (for example ACTION_LISTEN) to get_last_event_for to ensure you’re getting the appropriate event.)

- rule: process ordinal mention
 steps:
 - intent: ordinal_mention
 - action: process_ordinal_mention
 wait_for_user_input: false
slots:
  choice:
    type: categorical
    values:
    - get_started_playground
    - install_rasa
    ...
class ActionProcessOrdinalMention(Action):
  def name(self) -> Text:
      return "action_process_ordinal_mention"

  def run(
      self,
      dispatcher: CollectingDispatcher,
      tracker: Tracker,
      domain: DomainDict,
  ) -> List[EventType]:
      button_utterance_event = tracker.get_last_event_for(BotUttered, action_names_to_exclude=ACTION_LISTEN)
      if not button_utterance_event:
        # can't find last bot event
        return []      
      button_utterance = button_utterance_event.as_dict()
      if button_utterance.get("data") and not button_utterance["data"].get("buttons"):
        # last bot event didn't include buttons
        return []
      buttons = button_utterance["data"]["buttons"]
      mention_value = tracker.get_slot("mention")
      # assuming mention is a 1-indexed int, you may need to do some processing here
      choice = buttons[mention-1]
      return [SlotSet("choice", choice)]
1 Like

Note you can also influence action prediction from the custom action. You can do this by returning a FollowupAction event.

1 Like

Thank you @fkoerner ! I know it’s not really a policy (though it can make a prediction if you turn on execute_noop_action in the config, but that is only to ensure the order of processing if policies would execute in parallel), that was a fall back as I found out that pipeline components cannot access the tracker, so NLU is always context unaware.

I also thought about custom actions, but the idea was to catch a lot of possible intents in the answers, and the trigger for the custom action is unclear to me, because I cannot define “run after the answer to a button question” except I put that into a policy, and the remainder in the custom action, which was my original plan. I then found out that (technically) it is possible to change the tracker.

And that lead me here, to ask if it is dangerous to do so, and what the unwanted consequences could be. It is still possible to change the modification part to a custom policy which makes use of the standard events (such as UserUtteranceReverted etc) to modify the tracker to the desired state. I have to check if the action itself leaves traces in tracker that would harm Memoization and TED policies.

Regards Raoul

Hi Raoul, I understand where you’re coming from! Given your priorities, this policy is a reasonable solution. We just generally advise against modifying the tracker, as well as creating policies unless necessary, as the policy ensemble can become overly complex.

The possible consequences of modifying a tracker are unintended behaviour, as deleting or modifying events from the tracker will change how predictions are made. Of course, I understand this is also part of the point of your policy :slight_smile: , but this can also mean that your conversation designers may write a story or training data for a conversation flow, and it will not flow as planned. It can break continuity between previous predictions with the old tracker, and the newer predictions with the modified tracker, such that predictions are made that would not normally have come after the old predictions. Also, if you are modifying the tracker, this can result in a tracker state that would not naturally occur, making it more difficult to write appropriate training data for this case. One thing to watch out for would be low confidence or incorrect next action predictions after modifying the tracker.

If you choose to go ahead with the policy anyways, I would recommend that you make use of the tracker’s functions (like the method get_last_event_for) wherever possible. I would be happy to advise on this.

I have to check if the action itself leaves traces in tracker that would harm Memoization and TED policies.

This is a good idea and I’d strongly recommend you rigorously test your policy to be sure that it always only modifies the tracker in expected ways. Check also, that tracker.applied_events behaves as you expect it to. That’s one benefit of using tracker methods and standard events like UserUtteranceReverted: making sure the tracker is modified correctly becomes Rasa’s problem :slight_smile:

You could also consider using the custom policy in a more limited fashion. Adding examples to your intents like so would help the intent to be recognised correctly (as opposed to being classified as enter_data). In the case of rasa-demo, this is also the problem, as we only expect the intent to be recognised through the button. If we had examples like below, users could get somewhere by typing playground instead of clicking the button.

- intent: get_started_playground
  examples: |
    - I want to get started with the [playground](product)
    - [playground](product)

Finally, I’d consider combining the intents for ordinal mentions (these), and using entities to determine the actual value. They are syntactically very similar, so it may be difficult for an intent classifier to distinguish between them. Again, a pretrained entity extractor like Duckling can be very effective here.

- intent: ordinal_mention
  examples: |
    - the [first one]{"entity": "mention", "value": 1}
    - the [second one]{"entity": "mention", "value": 2}

Please let me know if can help support this effort.

1 Like

Hi Felicia,
thanks for your elaborate answer!
I am currently planning to do a combined approach from the policy and the custom action: The policy to make sure that the custom action is called after the user uttered an answer to a button question. So in the story:

steps:
  - some_bot_button_utterance
  - user_voiced_intent

it is always executed after NLU intent recognition (just like the nlu_fallback) but through the modifications (by using Rasa default events like UserUtteranceReverted and ActionReverted, just as the TwoStageFallback loop action does) assuring that the story can continue with user_voiced_intent if a positive match between intent and button was met.

That way, the policy will become a real policy (by making a prediction of the next best action) and the tracker changes are “by the book”. Thanks for the hints here, will use more of the provided functions!

Another, more social issue will be to keep the colleagues from creating 1000s of intents if it is too easy not to think twice if it really is a new intent or just an NLU sample for an existing intent. In your example in above answer the NLU example [product](playground) can lead to issues if you have other situations where customers could choose from multiple products, and you’re expecting an inform or enter_data intent with an entity (or slot set that is, because I learned entities are only set or unset from a feature vector perspective)

Best regards Raoul

Just one more -
My original idea was to put the whole intent mapping idea into the NLU pipeline - but there is no context in form of access to the tracker, not even read-only. If it was, the whole policy-calls-action deviation wouldn’t be needed. Any plans to offer that in the near future, or to have a section in the config between “pipeline” and “policies”?

Hi Raoul,

Sorry for the slow response.

I am currently planning to do a combined approach from the policy and the custom action

This seems like a great solution. I am very interested to see how the implementation goes and whether it works for you.

Another, more social issue will be to keep the colleagues from creating 1000s of intents if it is too easy not to think twice if it really is a new intent or just an NLU sample for an existing intent

You’re right about this, too… intent design is tricky, especially with evolving conversations. inform or enter_data is one of those intents that can quickly grow out of hand (like you point out). We are looking to move away from hard intent assignments in the future (see our experimental efforts with e2e training), but it is a challenge to shift away from the intent paradigm that is ingrained in dialogue management :slight_smile:

Any plans to offer that in the near future, or to have a section in the config between “pipeline” and “policies”

This is another thing we (research) speak and think about often! In general, the issue of context is a tricky one: context is useful for disambiguation, but too much information can also complicate training and prediction. We want to make sure that introducing more variables doesn’t unnecessarily blow up the prediction space or requires much more training data. We have some more pressing short-term concerns and thus don’t have any concrete plans for this yet (will need to experiment and explore), but rest assured we are interested in making more use of context for NLU predictions.

Best regards,

- Felicia

1 Like

Hello Felicia,

thank you for your response. I’ll make a showcase for my colleagues here, but I have one question regarding the test stories. AFAIK custom actions are not executed during testing, and thus a end-to-end test story cannot work if the intent needs to get reverted and re-added by an action. I am wondering however, how is that supposed to work with two stage fallback test stories?

Thank you Raoul

Hi @fkoerner !
That turns out harder than I thought. I always end up with an additional custom action in the tracker.applied_events() which confuses the other policies. I would have to put my action: action_process_button_answer after each user intent in the stories. That is ugly and unreadable. Any chance a custom action can make itself invisible (like undo) without hiding the effects it should have, i.e. the created new user utterance?
I saw that the two_stage_fallback action (is a loop action, so not directly comparable to SDK actions) seems to do that - by leaving no extra trace at deactivate(), only the action_listen and confirmation event it puts into the tracker.

Help is appreciated!
Raoul