DIETClassifier prone to changes in nlu data

Hi all, we used the DIETClassifier for both entity recognition and intent classification. For ensuring the quality of our bot, we are using several test stages with more that 1k tests in total, e.g. in our CI/CD pipeline. In this setup, however, we observed that DIETClassifier is very prone to small changes in the nlu training data. For instance, adding a single training sample to one intent caused multiple failures of the entity recognizers and vice versa. As a consequence it was hardly possible to maintain the system and, for instance, include small changes in the nlu data if some utterances are not correctly recognized in production scenarios.

At the end, we ended up with two models: one for entity recognitio and another one for intent classification which seems top be way more maintainable and more robust against changes in the training data.

I’m wondering if some else is using DIET in joint mode (i.e. entity recognition and intent classification in the same model) and experiences similar problems or if there are best best-practices to make the model more robust. Any ideas / experiences / etc are very much welcome.

thanks for sharing, @Christian ! This is super interesting and not something we’ve seen before. Can you share some examples that go wrong? Is there a strong correlation between specific intents and the presence of specific entities?

Also if you’re willing to share the dataset with us (privately) we can investigate more closely