What is the best method for entity extraction for names?

I would like to get the name from a user. For example, “Hello my name is Adrian.” or just “Adrian”. What is the best way to extract this entity?

For example, should I just add a lot more training examples, or is there a custom pipeline that helps extract names?

1 Like

Spacy has PERSON which is a built-in name entity. It works well for common names.

2 Likes

What if I have some unusual names then? I made a .txt file and added lot of names in that and linked a lookup table in nlu data. Seemed to be working but is this a good way of doing it? I also want to add respective Id with the names. What kind of format we can add in lookup table file?

Can we link other type of files in lookup table like excel or maybe something else?

2 Likes

Yes, lookup table is another option. It works well if you have unusual names. Names are difficult to extract in general so look-up table is a good way to do that.

Look up tables can only be in form of list in the code or new line seperated .txt files. I don’t think adding the ID here would work.

Could I have a quick example of using the PERSON entity with spacy?

Your nlu.md should contain the PERSON as the entity and also you should have spacy in your config file. For example

intent:greet_user

  • Hello my name is [Adrian](PERSON)
  • I am [John](PERSON)
  • Hi, I am [Arya](PERSON)

AWESOME Thank you!

So I have to use associated Id for the name as when a user says “How is John doing” I need to call multiple API’s to get information on John.

So when NLU detects John as a named entity then I need to pass a user_id to my database server (basically pass user_id as a parameter in API call) where i get a json response of data on john! So how can I map an id to a name! Do you have any idea for this?

Maybe you could use store the names and corresponding IDs in a database and once you extract the name, use it to get the ID and in-turn use that for your API calls?

Okay like send the name as a parameter and get back the id in another API ? @srikar_1996

Yes, you can do that.

Yes, you can do that.

Best to use the following (worked for me):

  1. Use a lookup table
  2. Use the approach described here Providing conversation context to the NLU using microervices For our NLU model for the name microservice we used NER_Spacy and NER_CRF and the combination worked for all the cases with pretty basic nlu training data. Spacy would sometimes understand things as ORG and such instead of PERSON, but since we knew from the context of the conversation (that’s the whole point of using microservices) that we’re dealing with a user giving us a name, we could just ignore Spacy’s entity definition and use whatever entity value it gave, regardless of whether Spacy thought it was PERSON or ORG or ENTITY.
    Hope this helps.
3 Likes

also believe that a “input box” widget (like you input your name in web pages) can help.

@srikar_1996 can you share the example of how to use the pipeline example including PERSON tag

In case the name of the person is not in the NLU and we still want to process it, what would be the best method to get that ?

1 Like

language: “en” pipeline:

  • name: “nlp_spacy”
  • name: “tokenizer_spacy”
  • name: “intent_featurizer_spacy”
  • name: “ner_spacy”
  • name: “intent_classifier_sklearn”

Add these to your pipeline, it should work.

1 Like

Hi @lgrinberg, So, you define "NER_Spacy and NER_CRF " one after another .

I have one question: Do we have to also define targeted entities that Spacy is able to extract as a regular entity, as below

- Please open a ticket for [James](PERSON) about [printer](device)

Or, should we leave it to Spacy as below

- Please open a ticket for James about [printer](device)

thank you,

Hi @huseyinyilmaz01, You don’t have to provide tagged NLU data for entities that are being extracted through SpaCy