Hi everyone , i came across this video to learn more about DIET architecture. After watching the videos i have some questions and hope to get expert to clear my doubts Link : Rasa Algorithm Whiteboard - Diet Architecture 1: How it Works - YouTube
- Pretrained word embedding, what is the default wording embedding if there are none is used ?
- How does to masking technique (with MASK token ) helps on prediction with using the mask loss ? 3.The total loss at the end of computation , how can we improve the performance by minimize the loss function value ?
Sorry if those are beginner question , i am quite new to machine learning , NLP .