Family name extraction

I’m actually using a form and trying to extract first and last names inside ! The bot first asks what’s your first name ? then what’s your last name ? and is supposed to extract both separately.

Concerning Spacy, I’m using “fr_core_news_sm”, it gives good results with french and english first names. I use a lookup table to extract arabic names written in latin letters, I’ve build a sort of database with around 3000 names.

Here’s an example of my nlu data for first names :

  • Je m’appelle Sarah
  • Mon prénom est Maria
  • Sophie
  • Ok, voici mon prénom : Camélia
  • Meriem
  • Je m’appelle Malika

And here’s an example of my nlu data for last names :

  • Smith
  • C’est Williams
  • Mon nom est Alaoui
  • Le voici : El Kamali
  • C’est Abadi
  • Saqqaf
  • mon nom est Al Andaloussi

Regarding your idea of asking for the first and last name at the same time, the user might say his name is “Will Smith” or “Smith Will”, his last name might be composed of 2 or 3 words, so I wouldn’t know how to locate his last name, especially if the first letters are not capital, even pos tagging wouldn’t help.

I hope my case is clearer now to you, any advice is more than welcome !