I have an entity that is supposed to be extracted by ner_crf. Earlier I had about 50-60 examples in the lookup table for this entity. Everything seemed to be working as expected.
Now, I have added about 1000 entries in my lookup table. Suddenly, intent_classifier_sklearn component gives following warning:
UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples.
What is happening here? Shouldn’t intent classifier work independently of the lookup table - which is entity extraction construct?
Can you share the data and the logs? I suspect the either one of the intents have too little information or it is too diverse.
Have you tried calculating the confusion matrix using evaluate module?
While these methods aren’t concrete, they should give you rough idea of where the problem lies.
The ill defined f-score is based on lack of enough training examples to evaluate a certain intent or entities, when you are using lookup tables, it uses one of the features of ner_crf , which then enforces training and tries to evaluate based on the training set, however we don’t put all the examples in the training set and hence could be the reason for this warning
Isn’t the very purpose of a lookup table that we shouldn’t have to include all the examples in training set? Secondly, I know it is just a warning, but it looks like it might affect predictions. Is there a guideline on how training should look to avoid it?
I’m sorry, I am a bit confused. Here’s what I understand:
Lookup tables use pattern feature from ner_crf. So, for entity extraction to work properly, we need enough (not all, but enough) data in the intent itself. So, essentially, if I add more values from lookup table to my training samples, performance should improve.