Question on Algorithms Whiteboard How DIET works

Hi everyone , i came across this video to learn more about DIET architecture. After watching the videos i have some questions and hope to get expert to clear my doubts Link : Rasa Algorithm Whiteboard - Diet Architecture 1: How it Works - YouTube

  1. Pretrained word embedding, what is the default wording embedding if there are none is used ?
  2. How does to masking technique (with MASK token ) helps on prediction with using the mask loss ? 3.The total loss at the end of computation , how can we improve the performance by minimize the loss function value ?

Sorry if those are beginner question , i am quite new to machine learning , NLP .

1 Like

Can anyone help me on those questions ?