I’m exploring an approach where I use
ner_spacy as my general entity extractor, and in special cases where
ner_spacy performs badly, I train
ner_crf to handle those, and then put a logistic regression on top of
ner_crf to pick the right entity.
My problem is, how can I tag part of the training data that I want to be used with
ner_crf? I tried adding the
extractor field, with the corresponding values
ner_spacy / ner_crf, but then realised that the function
filter_trainable_entities just removes the entities, not the whole training data. This only results in confusing the CRF model.
I tried adding a new field to the specialized messages:
specialized_crf: True, and subclassed the CRFEntityExtractor such that its training function would select only the training data with that field set to True. But it seems that the
specialised_crf field gets deleted somewhere in the pipeline.
Any ideas how I can tag part of the data and have my subclassed CRF entity extractor filter by it?