Question about DIET classifier implementation details - Are featurizers trained? (and others)

Hello there ! I hope you’re having a great day.

I’m not sure I’m in the right section of this forum so don’t hesitate to redirect me.

I have questions about DIET classifier implementation in RASA : Reading the article (https://arxiv.org/pdf/2004.09936.pdf), it seems that sparse and dense features can be trained/fine-tuned with the rest of the model, but it can also be frozen (Table 4 shows results for frozen embeddings of ConvRT as an example). Which solution was used in RASA DIET Classifier ? Does it depend on the configuration file or not ? I’m particularly curious about LMFeaturizer.

Also, in the paper, it’s said that Transformers only have 2 layers (but it can be modified through configuration). As pre-trained Transformers can be used, which layers from pre-trained Transformers are selected ? (last ones ? something else ?)

I tried finding answers by reading the code, but it’s a little too dense for a non-trained eye.

Thank you very much ! :slight_smile:

@koaning while searching the forum, I read that you may be able to answer questions about DIET (Interactive Widgets to explain DIET - #11 by koaning) ?

Thanks for any info you could provide