Using DIETClassifier with ConveRT and Transfomer based featurizers

Hi everyone,

Rasa 1.8 has brought a lot of changes, and new goodies! While I’m extremely grateful to the team for enabling Transformer based featurizers and added new classifiers, there is minimal information on how best to use these.

So, my questions from the forum are:

  1. The all new DIETClassifier. Apart from the config provided in the migration guide, do we know anything about setting the parameters? How many transformer layers, how many heads, any empirical evidence as to why?
  2. How does the new DIETClassifier is best to be combined with different featurizers? e.g. if I have a ConveRT featurizer vs BERT vs regular old CountVectorsFeaturizer should I change DIET params and how?

Once again I’m grateful for the Rasa team, but with the lack of a whitepaper or a results comparison upgrading to new features is becoming a mini-thesis topic now :smiley:

Thanks, Thusitha

We are exploring all abilities of DIET ourselves and preparing blogposts and better documentation about it.

The defaults provided are the ones that we found works best, but depending on the amount of data you have, I would try to tweak transformer_size and number_of_transformer_layers, also try turning on use_masked_language_model if you use CountVectorsFeaturizer.

Regarding different featurizers, it depends on the domain of your data. We found that most of the time a combination of ConveRT (if your data is in English) and CountVectorsFeaturizer (one of our default pipelines) works quite well.

1 Like

The DIETClassifier and supported hyperparameters are described on the Components page here.

1 Like

Thanks, saw that. A CV round takes around 1 hr for our dataset, so limiting the parameter set for a param search is the only option we have. @Ghostvv’s comments help and we would work in this direction. The problem we faced was as soon as we upgraded based on the default parameters mentioned in the upgrade guide, our performance from the previous V1.6.1 configs went down.

could you post comparison performance numbers?

Hi guys, the videos were great. After some experimentation we got stuff to work more or less. Thanks!

Could you elaborate on what steps you took here?

Well mainly,

  • Changing the sparsity level in the fully connected layer. There was no sparsity in v1.6.1 Embedding IntentClassifier. I figured this out the hard way.
  • Changed back to CRFEntityExtractor despite the warning. DIET was giving me false matches. For entities, false positives are extremely bad for me. I just want the entity extractor not be fancy, but identify an entity I mark in the nlu training examples correctly. CRFEntityExtractor does that, and I’m happy with it. Hope it wont be taken down in v2.0. (i can copy the code and create a custom component if it happens :smiley: )
  • Having multiple transformer layers in the DIET was slowing down inference. Reduced it to 1 and set the sparsity to a lower level.
1 Like