How can I use only a selected part of the training data for the CRF model?

einar.bui · August 16, 2018, 8:22am

I’m exploring an approach where I use ner_spacy as my general entity extractor, and in special cases where ner_spacy performs badly, I train ner_crf to handle those, and then put a logistic regression on top of ner_spacy and ner_crf to pick the right entity.

My problem is, how can I tag part of the training data that I want to be used with ner_crf? I tried adding the extractor field, with the corresponding values ner_spacy / ner_crf, but then realised that the function filter_trainable_entities just removes the entities, not the whole training data. This only results in confusing the CRF model.

I tried adding a new field to the specialized messages: specialized_crf: True, and subclassed the CRFEntityExtractor such that its training function would select only the training data with that field set to True. But it seems that the specialised_crf field gets deleted somewhere in the pipeline.

Any ideas how I can tag part of the data and have my subclassed CRF entity extractor filter by it?

einar.bui · August 16, 2018, 9:01am

One viable approach that I’m going with now is to pass a config parameter to my subclassed CRF, containing the path to the data I want to use. Any other ideas welcome.

PS. Why do all my forum posts appear with a grey title? I see a few others with grey titles as well, but all of mine are. I just don’t understand what it means.

akelad · August 24, 2018, 1:22pm

All the entities you label in your training data are only used by ner_crf, never by ner_spacy. ner_spacy is a pretrained entity module.

As for the grey title, I think that just means you’ve looked at the post before

Topic		Replies	Views
Leveraging both spaCy and CRF entity extraction correctly Rasa Open Source	8	4929	February 18, 2020
Using NER as a Feature for CRFEntityExtractor Rasa Open Source	6	1696	June 28, 2021
Interactive learning with ner_crf and pre trained extractors Rasa Open Source	1	595	March 3, 2019
NER_CRF model is not generalizing Rasa Open Source	3	832	December 2, 2019
Feeding Custom/Pretrained embeddings for ner_crf Rasa Open Source	9	3249	May 22, 2020

How can I use only a selected part of the training data for the CRF model?

Related topics