Diet Architecture: Transformer Output of CLS and Intent Similarity

Hello, i’ve watched DIET algorithm whiteboard eps 1 and 2 on youtube. I tried to understand the explanation, especially on Similarity between Transformer output of CLS and Intent labels.


The video explain that the output of Transformer Block ( also CLS ) are large numeric vector [256] and then Embedded to calculate the similarity with Intent Labels . So i’ve a few question here :

1. Can Transformer Block process the one hot encoding vector ? Since there's a Input Embedding on both Encoder and Decoder layers.


  1. Would you like to explain about what kind of Embedding on Intent Labels ? Does it embed every training data that has target intent ? For example: Play Games Intent has 10 training sentences


I’m very excited about Rasa :slight_smile: , great architecture and also give amazing way to explanain what behind.

Any answers and clue would be appreciated so much , Thanks :slight_smile:

hello, any idea for this problems ? :slight_smile: