Hello, i’ve watched DIET algorithm whiteboard eps 1 and 2 on youtube. I tried to understand the explanation, especially on Similarity between Transformer output of CLS and Intent labels.
The video explain that the output of Transformer Block ( also CLS ) are large numeric vector  and then Embedded to calculate the similarity with Intent Labels . So i’ve a few question here :
1. Can Transformer Block process the one hot encoding vector ? Since there's a Input Embedding on both Encoder and Decoder layers.
- Would you like to explain about what kind of Embedding on Intent Labels ? Does it embed every training data that has target intent ? For example: Play Games Intent has 10 training sentences
I’m very excited about Rasa , great architecture and also give amazing way to explanain what behind.
Any answers and clue would be appreciated so much , Thanks