How to extract person names from user input?

huseyinyilmaz01 · September 27, 2020, 10:29pm

Upon a specific intent, I need to make a query with the extracted person name. I used combinations of “SpacyEntityExtractor” and “CRFEntityExtractor”, alone and together.

There is no problem with the intent prediction. But they are not working properly. For example, it extracts “Parker” and “George”, but not “Henry” and “Diana”. And interestingly, it doesn’t extract names which were extracted before. How should I extract the person names?

My congig.yml:

pipeline:
  - name: "SpacyNLP"
  - name: profanity_analyzer.ProfanityAnalyzer
  - name: SpacyTokenizer
  - name: CountVectorsFeaturizer
  - name: RegexFeaturizer
  - name: "SpacyEntityExtractor"
  - name: "CRFEntityExtractor"
  - name: DIETClassifier
    epochs: 70
    random_seed: 2
  - name: EntitySynonymMapper

my intent:

## intent:ask_homework

- Show me the [homework](homework) for [George](PERSON)

- what is the [homework](homework) for [Yuri](PERSON)

- [workload](homework) for [Eva](PERSON)

- what is the homework for Adam

- what does James have

- [homework](homework) for Henry

- [homework](homework) of [Diana](PERSON)

Juste · September 28, 2020, 1:53pm

Hi @huseyinyilmaz01. Names are tricky to extract since there is now general pattern, but there are some techniques you can use to get the best possible result. Here are my thoughts on it:

A little comment on your training data - I can see that some examples like “what is the homework for Adam” doesn’t have the label for the entity. Is there a reason for that? In general, labelling some of the entities while leaving out some other ones will make your model prone to mistakes. So the first thing I would suggest is to make sure that all names you have in your training data are labelled.
Using the SpacyEntityExtractor for names is great since it should do a pretty good job in extracting common names.
You could also try using a form with a custom slot mapping. The method is quite nicely described in this thread.
You could also include a lookup table in your pipeline if you can get a list of names to be extracted.

huseyinyilmaz01 · September 28, 2020, 2:30pm

Thank you @Juste, I changed the training data as you stated in the 1 point, but still, it is not persistent. I mean still missing some names. Regarding the form, my first goal will be handling the issue without asking more questions and make it more humanlike. Lastly, Lookup tables are good for if you a fixed list, but we don’t have a fixed list, and the bot should handle all given names.

The point is, you have to be precise because your response depends on names. Otherwise, it will be meaningless.

Thank you again for your inputs @Juste.

I will be glad if someone shares his/her successful case structure.

forwitai · October 22, 2020, 1:22pm

Hello @huseyinyilmaz01, can you please share what you ended up doing ? Thanks.

huseyinyilmaz01 · October 22, 2020, 4:25pm

Hi @forwitai, First of all, my understanding, there is no ultimate solution for name extraction.

What did I do? I used 3 different solutions in the same chain:

1. an extractor in the config pipeline in order to catch the names as {PERSON} entity

2. if the {PERSON} entity is empty, spacy matcher comes in play within a custom action

3. if there is no spacy match, the same custom action splits the value and search in already known names.

finally, if Extractor couldn’t catch, and there is no match from spacy, and there is no similar name in my database, Bot asks for a valid name

forwitai · October 23, 2020, 8:43am

Hello @huseyinyilmaz01, thank you for answering. Your approach sounds like a good one, I have two questions though.

For the 3rd step, when you say you search in your DB of names, that means you have a DB of turkish (I suppose) names only ? Or international names ? How can you manage different country names ? Also how large is your DB, 1000 names or more ?
Also, what if the name provided by the user exists in your DB but is written differently with two or three letters changed ? Do you still ask the user to provide a valid name ? wouldn’t that be inconvenient, especially that there is nothing they can do about it ?

Thanks a lot.

huseyinyilmaz01 · October 23, 2020, 1:05pm

Hi @forwitai, regarding the 3rd step, Names in DB is not for covering every possible name, they are just names of all users who used the Bot before. The second point, typos or missing: this is the point I need to add smt which is able find similar ones with a threshold. For example “0.75” will be able to catch John, when user entered Johm. I will work on that part later.

forwitai · October 23, 2020, 1:13pm

Okay great, thanks a lot for the clarifications. One last question, for the 3 steps, how do you manage to order them in RASA ? In other words, if the extractor fails, how do you assign the task to spacy then to the lookup in your DB ? I have no idea how to implement it, if you could possibly clarify that point.

Thanks again.

huseyinyilmaz01 · October 23, 2020, 1:19pm

1st step is already in the pipeline and it works first. Custom action (only one) comes in order to follow the steps. Custom action checks the PERSON slot if it is filled by Extractor. if it is empty (no name or extractor failed), it triggers the spacy, if spacy doesnt macth, 3rd step is triggered in the same custom action.

forwitai · October 23, 2020, 1:22pm

Okay got it, thank you so much !

forwitai · October 23, 2020, 3:05pm

Can you please tell me how do you trigger spacy ? Do you import the library in your custom action and don’t use it in your pipeline directly ? can you please share this specific part of your code ? that would be very helpful.

Thank you !

huseyinyilmaz01 · October 27, 2020, 12:32am

Hi @forwitai, Here is how I use spacy matcher inside custom action; if tracker.get_slot(‘PERSON’): owner = tracker.get_slot(‘PERSON’)

            else:
                txt = value

                nlp = spacy.load('en_core_web_sm')
                matcher = Matcher(nlp.vocab)
                nlp_text = nlp(txt)

                pattern = [{'POS': 'PROPN'}]
                matcher.add('NAME', None, pattern)           
                matches = matcher(nlp_text)
                
                if matches:
                    for match_id, start, end in matches:
                        span = nlp_text[start:end]
                        owner = span.text

forwitai · October 27, 2020, 7:45am

Thank you so much for the help @huseyinyilmaz01, much appreciated !

Topic		Replies	Views
How to use SpacyEntityExtractor Rasa Open Source	4	4784	June 29, 2020
Get Person names as entity from user input Rasa Open Source	3	1041	February 1, 2021
Basic Question on Extracting Values from User Input Using FormAction Tutorials, Resources & Videos	6	1989	March 3, 2020
Entity named recognition with spacy Rasa Open Source	7	3051	January 23, 2022
Spacy and name entity in intent Rasa Open Source	6	2032	March 31, 2023

How to extract person names from user input?

Related topics