Intent evaluation

Dear all,

we are trying to evaluate/test our intent classifier. For this, we are currently using the “rasa test nlu” script. In our current pipeline we are using following setup:

  • SpacyTokenizer
  • SpacyFeaturizer
  • LexicalSyntacticFeaturizer
  • DietClassifier (only entities)
  • SpecialIntentFeaturizer
  • DietClassifier (only intent)

As far as I understand the “rasa test nlu” performs the entire pipeline using the trained models. Thus, errors that arise in the first classifier (DietClassifier for entities) are propagated to the second classifier (DietClassifier for intents).

Is there a possibility to use only information from the “gold” data for the “rasa test nlu” script?

To my understanding this is also the case when training the classifier, right? Here only information from feature extractors and annotations from the training/test data are used?

Thanks and all the best, Martin

Could you clarify what you mean here? Is there a possibility to use only information from the “gold” data for the “rasa test nlu” script?

If you would like to evaluate intent classification and entity extraction separately, you could test two pipelines individually. That being said though, the intent and entity results are evaluated separately anyways

Just so I understand, what does SpecialIntentFeaturizer do? Add features that only the intent model is supposed to use? It occurs to me as strange to train DIET twice since the model can produce intents as well as entities in one go. There might be a good reason why you’ve split it up though so I’m curious to hear what the reasoning was.

The current setup will have two independant DIET models. This means that in the current setup the entities have no influence over the intents and vise versa. By having them both in the same model the internal transformer archtecture might be able to pick up on any pattern here. You might enjoy this algorithm whiteboard video for more context on having a single model might be beneficial.

That said, and as Akela just mentioned, intents and entities are evaluated seperately.

Thanks a lot for your useful response!!!

@Akela:

ok for clarification:

While training the second classifier (for intents) the message object is filled with features from the featurizer but also seems to use information from the first classifier (entities) which comes from the training data (generated with “train” method of the classifier)

When now testing the intent classification with “rasa test nlu” the features are generated equally, however, the entities from the first classifier are generated based on the outcome of the classifier (“process” method).

but thanks for the hint with the two pipelines. This might solve our issue :slight_smile:

@Vincent: Indeed we know about the nature of having one classifier for both tasks. We had classification issues when using the lexicalized features with the intent classifier and had to remove them. This issue is grounded again in the fact that we use placeholders for entities while training.

1 Like