I’m exploring an approach where I use ner_spacy
as my general entity extractor, and in special cases where ner_spacy
performs badly, I train ner_crf
to handle those, and then put a logistic regression on top of ner_spacy
and ner_crf
to pick the right entity.
My problem is, how can I tag part of the training data that I want to be used with ner_crf
? I tried adding the extractor
field, with the corresponding values ner_spacy / ner_crf
, but then realised that the function filter_trainable_entities
just removes the entities, not the whole training data. This only results in confusing the CRF model.
I tried adding a new field to the specialized messages: specialized_crf: True
, and subclassed the CRFEntityExtractor such that its training function would select only the training data with that field set to True. But it seems that the specialised_crf
field gets deleted somewhere in the pipeline.
Any ideas how I can tag part of the data and have my subclassed CRF entity extractor filter by it?