End-to-end Model Architecture

cmills · June 8, 2021, 1:53pm

I was very interested in Alan’s talk at the Chatbot Conference 2021 a couple weeks ago titled NLU: Going beyond Intents & Entities, along with Genie’s video at https://www.youtube.com/watch?v=27rH1JfxvzI.

I was hoping you could share any details about the model architecture behind the end-to-end feature and, more specifically, how that interacts with the standard intent classification models?

Since the Rasa approach isn’t fully end-to-end ML (you’re still allowing for intent classification), I’m guessing the intent classification runs first, if that fails (intent match is below a confidence threshold) then the end-to-end model kicks in which takes the conversation history plus the latest utterance through the end-to-end model? I’m guessing it might be more complicated than that, but I couldn’t find any more explicit details.

Thanks so much!

anca · June 9, 2021, 12:12pm

Hi @cmills thank you for your interest in the E2E feature You can find more details in the documentation here and here. Please let me know if you have any follow-up questions.

cmills · June 9, 2021, 7:57pm

Hi Anca, thanks for the response. The documents you posted look like they’re focused on how the feature is configured. I’m more interested in the guts of how the feature works - The specific ML model architecture, and how Rasa determines whether to do straight intent matching or use the context of the entire conversation. Do you have any of those specific ML model architecture details you can share?

koaning · June 10, 2021, 8:08am

Hi Cory,

I am realising that I should start making algorithm whiteboard content specifically on the implementation feature of e2e. One thing I can confirm is that it’s just an adaptation of TED under the hood. If you’re not familiar with TED you may enjoy these two algorithm whiteboard videos:

The main thing that happens in the end-to-end situation is that we send more data to TED. It’s not just the predicted intents/slots/entities. It’s now also the featurized text utterance that is sent along. These are the sparse/dense features that are also generated in your NLU pipeline.

From my current understanding this is the main difference, but there may be details that I am omitting here since I’ve not looked at the codebase in detail.

j.mosig · June 10, 2021, 8:17am

Hi @cmills

We don’t have good material on this, unfortunately, as the feature is still experimental. On a high level, what TED does with plain text during training is this:

At training time, the data contains only either user text or user intent, not both. TED learns to do predictions with either one. So to ensure that the test distribution is the same as the training distribution, we run TED with a batch of 2 at inference time when a new user message comes in. One batch example where the last user message has the intent label that comes from the NLU pipeline, and one where it is featurized by its plain text (as in the picture). The text-based prediction is chosen if and only if its confidence is above some threshold and the maximum similarity score is higher than that for the intent based prediction. See here. This is ok because the similarities come from exactly the same model. We then store which choice was made, so at the next dialogue step, the dialogue history is featurized according to these decisions made (intent label or text for each turn, even though both would be available at inference time).

Topic		Replies	Views
End-to-end Training [Experimental] Important Updates	33	6338	September 22, 2022
Stories and conversations - is my mental model right? Rasa Open Source	3	953	August 16, 2018
About chatGPT's architecture Rasa Open Source	7	2712	July 27, 2023
Rasa NLU in Depth - Part 1: Intent Classification Tutorials, Resources & Videos	0	2699	February 21, 2019
Anyone written code to generate end-to-end stories from "regular" stories? Rasa Open Source	4	356	July 15, 2020

End-to-end Model Architecture

Related topics