NLU only detecting entities explicitly present in training data

The only entities showing up in the interpreter are the ones given in training data. For the given dataset:

  {
    "text": "My name is alice",
    "intent": "name",
    "entities": [
      {
        "start": 11,
        "end": 16,
        "value": "alice",
        "entity": "name1"
      }
    ]
  },
  {
    "text": "I am Josh",
    "intent": "name",
    "entities": [
      {
        "start": 5,
        "end": 9,
        "value": "Josh",
        "entity": "name1"
      }
    ]
  },

  {
    "text": "my name is Greg",
    "intent": "name",
    "entities": [
      {
        "end": 19,
        "start": 15,
        "value": "Greg",
        "entity": "name1"
      }
    ]
  },
  {
    "text": "I am david",
    "intent": "name",
    "entities": [
      {
        "end": 10,
        "start": 5,
        "value": "david",
        "entity": "name1"
      }
    ]
  },

  {
    "text": "john is my name",
    "intent": "name",
    "entities": [
      {
        "end": 15,
        "start": 11,
        "value": "john",
        "entity": "name1"
      }
    ]
  },

When a query like “My name is alice” (present in training data) comes, it is correctly classified as the entity ‘name1’. But something like “My name is peter” (not present in training data) doesn’t give any entities. What is going wrong here? P.S. I tried ner_spacy but it only classifies names with first letter capital as entities.

2 Likes

Not an expert but as far as I can say, with the ner_crf extractor, you need to train it with “My name is peter” too

@shubham21197. I am also currently searching for an answer on this problem. Please let me know if you get any answer Thank you

But we can’t train on all the names out there. What can we do in a practical application where it should automatically extracts users name?

1 Like

@shubham21197 I only came across these way seems good but need a data preparation.

have a look at it , we need to provide some .txt file in the addotional lookups

@shubham21197 Have you tried this with https://github.com/RasaHQ/starter-pack-rasa-stack

It already has a bunch of name entity extraction and intents, might work for you

@shubham21197 Try out the rasa starter pack with both Tensor Flow pipeline and the Sklearn pipeline, you’d see a few differences especially for your query “My name is Peter”

ner_crf working with calculate weight of token based on prefix, suffix, word before and word after, you must variate training data to make it more generalize enough. Maybe you can try with lowering name, variate phrase, longer phrase.

The same problem is what i am facing. If you are able to solve yours, kindly help me out on this too.

This can happen if you have less training data in nlu for that statement.

I have a few options for experimenting

  1. Increase nlu data

  2. Also, DIET classifier in the pipeline can better extract entities do try out with this. If this also doesn’t work if entities are limited then

  3. lookup tables in rasa can be of some use. @Juste @maddymantha may be you can give more clarity

Hi there , I am facing a similar issue, i have a entity named location and an intent named inform_state_district , i have a look up table for location , still most of the times abt 80% it fails to detect the entity in the intent, i will attach the screenshots of all below

, the first white pic shows places of where it fails and where it doesnt, it would be great if someone can shed some light into why this behaviour occurs, (the slot district is mapped to location , if anyone is wondering y ) @nik202 any idea :sweat_smile::sweat_smile:

Are the places where it fails (like Durg and Ernakulam) in the lookup?

Yup all those are there, just attaching lookup for reference.district_data.yml (6.9 KB) state_data.yml (804 Bytes)

Both files are lookups for the location entity. You should merge them and try again.

@jerry share config.yml

Hope you following this benchmark NLU Training Data

Still No difference yet

As for the training data I have used the entity and intent stuff like in the previous screen shot, like the one mentioned in the docs, for config.yml i havent changed anything except for the fact that i added rulepolicy for forms, here is the fileconfig.yml (1.3 KB)

@jerry check this video : https://youtu.be/gvyfQZMnHPY