How can I improve Spacy name entity extraction?

joancipria · March 24, 2021, 8:58am

Hi! I’m currently using Spacy (spanish es_core_news_md) to extract names but it is not working really well. Let me explain it with the following user input:

Hola, soy Juan y vivo en España
(Hi, I’m John and I live in Spain)

With this sentence, Spacy successfully extracts "Juan" as PERSON. But, if user writes it with some small variations like not writing the comma or without the first capital letter, then Spacy fails to extract names. See the following example:

Hola soy Juan y vivo en España → Spacy extracts "Hola soy Juan" as PERSON
Hola soy juan y vivo en España → Spacy extracts "Hola soy juan" as MISC

This is my current config: config.yml (1.1 KB)

How can I tweak or improve this Spacy behaviour? Should I move to other entity extractor?

ganeshv · March 24, 2021, 12:17pm

I’ve not used Spacy’s NER in my pipeline, so may not be the answer you’re looking for. Do you have any training examples where the name is not in title case and with variations in punctuations?

joancipria · March 24, 2021, 12:31pm

Yes I have different variations in my training data but it doesn’t seem to help. You can also try also this in the spacy online demo

Topic		Replies	Views
SpacyEntityExtractor case sensitive issue Rasa Open Source	1	1012	November 4, 2019
Rasa's SpacyEntityExtractor does not work well with lowercase inputs Rasa Open Source	7	763	March 27, 2021
Recognizing people's names in Spanish. NER. Names Rasa Open Source	2	424	July 16, 2020
Spacy not recognizing PERSON entity Getting Started with Rasa	0	200	January 9, 2019
Can I Extract Persons Name by not using SpaCy's "PERSON" entity, Because It has some problems Rasa Open Source	1	459	October 7, 2021

How can I improve Spacy name entity extraction?

Related topics