Hi! I’m currently using Spacy (spanish es_core_news_md
) to extract names but it is not working really well. Let me explain it with the following user input:
-
Hola, soy Juan y vivo en España
(Hi, I’m John and I live in Spain)
With this sentence, Spacy successfully extracts "Juan"
as PERSON
. But, if user writes it with some small variations like not writing the comma or without the first capital letter, then Spacy fails to extract names. See the following example:
-
Hola soy Juan y vivo en España
→ Spacy extracts"Hola soy Juan"
asPERSON
-
Hola soy juan y vivo en España
→ Spacy extracts"Hola soy juan"
asMISC
This is my current config: config.yml (1.1 KB)
How can I tweak or improve this Spacy behaviour? Should I move to other entity extractor?