How can I improve Spacy name entity extraction?

Hi! I’m currently using Spacy (spanish es_core_news_md) to extract names but it is not working really well. Let me explain it with the following user input:

  • Hola, soy Juan y vivo en España
    (Hi, I’m John and I live in Spain)

With this sentence, Spacy successfully extracts "Juan" as PERSON. But, if user writes it with some small variations like not writing the comma or without the first capital letter, then Spacy fails to extract names. See the following example:

  • Hola soy Juan y vivo en España → Spacy extracts "Hola soy Juan" as PERSON
  • Hola soy juan y vivo en España → Spacy extracts "Hola soy juan" as MISC

This is my current config: config.yml (1.1 KB)

How can I tweak or improve this Spacy behaviour? Should I move to other entity extractor?

I’ve not used Spacy’s NER in my pipeline, so may not be the answer you’re looking for. Do you have any training examples where the name is not in title case and with variations in punctuations?

Yes I have different variations in my training data but it doesn’t seem to help. You can also try also this in the spacy online demo