Pipeline'ing NER

ryszardtuora · December 6, 2019, 2:01pm

My bot has some trouble recognizing first names and surnames. Also, the problem is that I use forms, so that I want to extract entities from single word responses, e.g. “First name?”, “Adam”. For some reason this latter option does not work well (sometimes the entity is not extracted, sometimes the intent is improperly recognized as a detour from the happy path), although it would seem trivial.

Since I work in Polish, I know that there is quite a good model for NER in spaCy, which includes recognizing names. The problem is that it does not distinguish between first names and last names, and just returns a single entity person_name (e.g. “Adam Smith”). I was thinking that the best solution would be to put spacy NER in the pipeline before the CRF extractor, in hope that it would treat the outputs of the former, as a feature in deciding. Is it possible?

BTW: Can i block detours from happy paths in any way?

Tanja · December 11, 2019, 12:35pm

Currently, it is not implemented. So, your cannot use the output of one entity extractor as features for another one. However, you could write your own custom component that adapts the SpacyEntityExtractor to exactly do that.

Another idea, that might help. are lookup tables (Training Data Format). If a first name is present in that list, the corresponding feature “present in first name lookup table” would be set. That might help during training your CRF. Apart from that, you should make sure to have a couple of training examples that just mention the first name in your training data.

ryszardtuora · December 11, 2019, 3:24pm

Thanks. In the end lookup tables with the right amount of data seem to work well. Am I right in assuming that I dont need to list the lookup tables in the features for the CRF entity extractor in config.yml? At the moment there is no explicit feature list there.

Tanja · December 11, 2019, 3:49pm

The lookup tables are processed by the RegexFeaturizer. It generates features out of the lookup tables. So, you should have this component in your pipeline before the CRFEntityExtractor. You don’t need to change any config parameter of the entity extractor.

ryszardtuora · December 11, 2019, 3:58pm

Yes, I have the RegexFeaturizer. Thanks a bunch!

Topic		Replies	Views
Name entity not extracting Rasa Open Source	22	5137	September 17, 2020
Unable to use lookup functionality Rasa Open Source	4	373	April 15, 2021
Problem with entity extraction Rasa Open Source	4	2559	February 6, 2019
Family name extraction Rasa Open Source	23	3465	October 15, 2021
Leveraging both spaCy and CRF entity extraction correctly Rasa Open Source	8	4927	February 18, 2020

Pipeline'ing NER

Related topics