Name entity not extracting

language:“en” pipeline:spacy_sklearn

Starter pack training data for getting the name doesn’t work for random names which is not listed in training. It should pick up any name right?

Training data: ## intent:name - My name is Juste <!— Square brackets contain the value of entity while the text in parentheses is a a label of the entity --> - I am Josh - I’m Lucy - People call me Greg - It’s David - Usually people call me Amy - My name is John - You can call me Sam - Please call me Linda - Name name is Tom - I am Richard - I’m Tracy - Call me Sally - I am Philipp - I am Charlie - I am Charlie - I am Ben - Call me Susan - Lucy - Peter - Mark - Joseph - Tan - Pete - Elon - Penny - name is Andrew - I Lora - Stan is my name - Susan is the name - Ross is my first name - Bing is my last name - Few call me as Angelina - Some call me Julia - Everyone calls me Laura - I am Ganesh - My name is Mike - just call me Monika - Few call Dan - You can always call me Suraj - Some will call me Andrew - My name is Ajay - I call Ding - I’m Partia - Please call me Leo - name is Pari - name Sanjay

Hi,

No, it won’t pick up all names because there are millions of names and Spacy won’t be able recognise. Best way to deal with names is to have a lookup table with a list of all the names. Take a look at this: lookup tables

What’s the point of deep learning if names has to be a look. There are models that does it but looks like this one is not

We can’t expect it to work with millions of names. Spacy’s PERSON entity does a fairly good job wih names. But I noticed that it does not work well with Indian names.

@sibbsnb I’ve had pretty good name recognition when using ner_crf in the NLU pipeline. I had to provide quite a lot of training data though (about 200 examples.)

Slightly off topic - @srikar_1996 If you are dealing mainly with Indian languages - have you seen the chatbot_ner project? They have support for entity recognition in English, Hindi, Gujarati, Marathi, Bengali and Tamil. I do not know if there is a Rasa Integration though.

@netcarver I am using English itself. But it gives me issues with Indian names. I did not provide as many examples as you said. I might have given like 30~40 examples.

Check out this link: displaCy Named Entity Visualizer · Demos · Explosion AI and try the name recognition there. If it works there then there is a problem in your pipeline if it doesn’t work there, then your names are to exotic for spacy and you need to train the nercrf yourself.

Yep, I always use this. I’ve to provide more examples to my crf I guess.

@srikar_1996 as far as I know, spacy is pre-trained for entity recognition. Have you tried seeing if there are any differences between the small, medium and large models when you put your example names into the demo @mauricedoepke posted above?

For example, this text…

Pushpa went to the market with Getsy.

… only one name is recognised using the small model, while the medium model locates both.

You may get better recognition with a larger model.

Standford core nlp does a great job on person entity. The space doesn’t seem to detect it.

Hi, I’ve tried with small, medium and large models. It doesn’t always detect. For example, sm detects the first name, md doesn’t detect anything and LG detects the second name. Earlier you’ve mentioned that you provided close to 200 examples for your crf. I was wondering, providing that many examples is as good as having a lookup table right?

@srikar_1996 You probably need feedback from someone with more experience using lookup tables/regexes than I have. However, my gut feel is that they are not the same.

The documentation states that lookup tables are only usable by ner_crf and that the entries in the table are combined to form one large, case insensitive, regex pattern that is then applied to the input text. It sounds like your recognition may be limited to just the example names if you were to go down that route - though I am not certain of that.

Last time I checked, there are way more names possible in English than I used in the ~200 examples I trained on.

Overall, it may be worth you doing a little experiment to see which method gives you better recognition.

Yea, I saw how lookup tables work. And yes, you’re right. Lookup tables are limited to the examples in the file. As of now, for my use case, lookup tables seem to do what I need. I’ll try out some other pipelines as well to see what works best.

Continuing the discussion from Name entity not extracting:

Same problem here. As Sibish Basheer said: “what is the point with deep learning if names has to be a look?” When I say to the bot “My name is [John] (PERSON)”, with PERSON as a slot, I’d like the robot to answer “nice to meet you [John]”, whatever name I put between brackets. It should work like a function: for any data between [ ], answer: nice to meet you [data]. Same thing for translation. If I tell the bot: translate [this] (translation) to Spanish, with a custom action to translate the slot “translation”, I hope the robot will translate any data between the brackets. Is there a way to force the bot to do that? Thanks.

Hi,

This can be done using slots. Your template must have something like this:

utter_greet
 - nice to meet you {PERSON}

The person will be replaced with the slot value which is John in this case.

Alternately, you can also use custom actions to do the same.

Yes, that is exactly what I did. But the problem is for many names, the bot doesn’t identify them as “names” in the slot. So the bot answers : Nice to meet you “None”. Instead of: Nice to meet you “the name”. I don’t know how to force the bot to repeat the [name], be it strange or uncommon.

If the bot is returning None, it means the nlu was unable to extract the Name. You can check the logs and see if the entity was identified and if the slot was filled. Names are sometimes difficult to identify because there can be millions of possibilities. In this case, provide the nlu with more examples or you can use a lookup table.

Yes, I will will try out as you suggest. But wouldn’t it be possible to force the bot to accept any data between the brackets as a name to repeat? So if I say: my name is umbrella, I would like the bot to answer: Nice to meet you “umbrella”, instead of nice to meet you “None”. Even if umbrella is not a given name. Why is it so difficult to get that result? Did I miss something?

Actually, if the sentences are similar in structure like My name is xxxxxx then the bot will pick it up even if it’s not a PERSON entity because there is no way the algorithm would know that it isn’t a human name. Maybe this has something to do with the spacy’s PERSON entity.

Try using your own ner_crf instead of PERSON and see what happens, I’m guessing it should work.

Does a ‘name’ have to occur at least once in the training data for CRF to be able to recognize it?