Hi! I’m currently using Spacy (spanish es_core_news_md) to extract names but it is not working really well. Let me explain it with the following user input:
-
Hola, soy Juan y vivo en España
(Hi, I’m John and I live in Spain)
With this sentence, Spacy successfully extracts "Juan" as PERSON. But, if user writes it with some small variations like not writing the comma or without the first capital letter, then Spacy fails to extract names. See the following example:
-
Hola soy Juan y vivo en España→ Spacy extracts"Hola soy Juan"asPERSON -
Hola soy juan y vivo en España→ Spacy extracts"Hola soy juan"asMISC
This is my current config: config.yml (1.1 KB)
How can I tweak or improve this Spacy behaviour? Should I move to other entity extractor?