Issue with entity detection - fails to detect outside of the training set


I’m building a test bot using rasa nlu and core capabilities where in it captures two entities and summarize the entity values that were captured.

For this, I had around 2000 examples for 9 intents and 2 entities and tried with both ‘spacy_sklearn’ and ‘tensorflow_embedding’ pipelines. There is a problem while detecting entities for a value that is not present in the training dataset.

For example, Below is the intent where I detect ‘name’ as the entity. ## intent:name - My name is Alice, - I am Josh, - I’m Lucy, - People call me Greg, - It’s David, - name is Johny, - John is my name, - lucy is my name

Entities aren’t getting detected if I give any other name apart from the above trained sample. Failed utterances: ‘Kumar is my name’, ‘I am Rocky’ etc.

This suggests that entity detection is merely a string check or rule based check on the trained examples. Could you please suggest any way to extract entities accurately.

Regards, Swamy

1 Like

Yes Swamy! This will happen anyway. So computer doesn’t know that kumar is a name or sangakara is name. So you have to feed a list of names to make your system identify that name. Even i’m trying to find answer for this one. In the meantime what you can do is create a .txt file and add as many names as you want in that file. like this

so now place this file in your data folder and add a lookup table link your file. You can check out the concept of lookup tables in rasa docs. Let me know if you face any issue in this method!

Akshit thanks for the suggestion. I will try this and let you know. Thanks.

1 Like

Swamy, this is a common problem on NLP problems and the way Rasa tackles the subject is outsourcing methods on entity extraction. In your case, your entity is name and spaCy would be helpful in this case, more specifically, providing PERSON entity classification to your NLU pipeline. In other words, the job mentioned by Akshit has already been done and is available through these models. And the way to use them on your chatbot is to plug them onto your NLU pipeline.

For names try a combination of a lookup table, some minimal training data and what we suggest here:

That should work well. Please don’t hesitate to ask if you need any more help with this. We were fighting this problem for a while but think we found the solution.