How to extract person names from user input?

Upon a specific intent, I need to make a query with the extracted person name. I used combinations of “SpacyEntityExtractor” and “CRFEntityExtractor”, alone and together.

There is no problem with the intent prediction. But they are not working properly. For example, it extracts “Parker” and “George”, but not “Henry” and “Diana”. And interestingly, it doesn’t extract names which were extracted before. How should I extract the person names?

My congig.yml:

pipeline:
  - name: "SpacyNLP"
  - name: profanity_analyzer.ProfanityAnalyzer
  - name: SpacyTokenizer
  - name: CountVectorsFeaturizer
  - name: RegexFeaturizer
  - name: "SpacyEntityExtractor"
  - name: "CRFEntityExtractor"
  - name: DIETClassifier
    epochs: 70
    random_seed: 2
  - name: EntitySynonymMapper

my intent:

## intent:ask_homework

- Show me the [homework](homework) for [George](PERSON)

- what is the [homework](homework) for [Yuri](PERSON)

- [workload](homework) for [Eva](PERSON)

- what is the homework for Adam

- what does James have

- [homework](homework) for Henry

- [homework](homework) of [Diana](PERSON)
1 Like

Hi @huseyinyilmaz01. Names are tricky to extract since there is now general pattern, but there are some techniques you can use to get the best possible result. Here are my thoughts on it:

  1. A little comment on your training data - I can see that some examples like “what is the homework for Adam” doesn’t have the label for the entity. Is there a reason for that? In general, labelling some of the entities while leaving out some other ones will make your model prone to mistakes. So the first thing I would suggest is to make sure that all names you have in your training data are labelled.

  2. Using the SpacyEntityExtractor for names is great since it should do a pretty good job in extracting common names.

  3. You could also try using a form with a custom slot mapping. The method is quite nicely described in this thread.

  4. You could also include a lookup table in your pipeline if you can get a list of names to be extracted.

2 Likes

Thank you @Juste, I changed the training data as you stated in the 1 point, but still, it is not persistent. I mean still missing some names. Regarding the form, my first goal will be handling the issue without asking more questions and make it more humanlike. Lastly, Lookup tables are good for if you a fixed list, but we don’t have a fixed list, and the bot should handle all given names.

The point is, you have to be precise because your response depends on names. Otherwise, it will be meaningless.

Thank you again for your inputs @Juste.

I will be glad if someone shares his/her successful case structure.

Hello @huseyinyilmaz01, can you please share what you ended up doing ? Thanks.

Hi @forwitai, First of all, my understanding, there is no ultimate solution for name extraction.

What did I do? I used 3 different solutions in the same chain:

1. an extractor in the config pipeline in order to catch the names as {PERSON} entity

2. if the {PERSON} entity is empty, spacy matcher comes in play within a custom action

3. if there is no spacy match, the same custom action splits the value and search in already known names.

finally, if Extractor couldn’t catch, and there is no match from spacy, and there is no similar name in my database, Bot asks for a valid name

Hello @huseyinyilmaz01, thank you for answering. Your approach sounds like a good one, I have two questions though.

  • For the 3rd step, when you say you search in your DB of names, that means you have a DB of turkish (I suppose) names only ? Or international names ? How can you manage different country names ? Also how large is your DB, 1000 names or more ?

  • Also, what if the name provided by the user exists in your DB but is written differently with two or three letters changed ? Do you still ask the user to provide a valid name ? wouldn’t that be inconvenient, especially that there is nothing they can do about it ?

Thanks a lot.

Hi @forwitai, regarding the 3rd step, Names in DB is not for covering every possible name, they are just names of all users who used the Bot before. The second point, typos or missing: this is the point I need to add smt which is able find similar ones with a threshold. For example “0.75” will be able to catch John, when user entered Johm. I will work on that part later.

Okay great, thanks a lot for the clarifications. One last question, for the 3 steps, how do you manage to order them in RASA ? In other words, if the extractor fails, how do you assign the task to spacy then to the lookup in your DB ? I have no idea how to implement it, if you could possibly clarify that point.

Thanks again.

1st step is already in the pipeline and it works first. Custom action (only one) comes in order to follow the steps. Custom action checks the PERSON slot if it is filled by Extractor. if it is empty (no name or extractor failed), it triggers the spacy, if spacy doesnt macth, 3rd step is triggered in the same custom action.

Okay got it, thank you so much !

Can you please tell me how do you trigger spacy ? Do you import the library in your custom action and don’t use it in your pipeline directly ? can you please share this specific part of your code ? that would be very helpful.

Thank you !

Hi @forwitai, Here is how I use spacy matcher inside custom action; if tracker.get_slot(‘PERSON’): owner = tracker.get_slot(‘PERSON’)

            else:
                txt = value

                nlp = spacy.load('en_core_web_sm')
                matcher = Matcher(nlp.vocab)
                nlp_text = nlp(txt)

                pattern = [{'POS': 'PROPN'}]
                matcher.add('NAME', None, pattern)           
                matches = matcher(nlp_text)
                
                if matches:
                    for match_id, start, end in matches:
                        span = nlp_text[start:end]
                        owner = span.text

Thank you so much for the help @huseyinyilmaz01, much appreciated !