Entity not identified (mentioned in the training data)

I am building a bot for traveling services where bot needs to identify the source and destination from the sentence. ex : I want to go to New York from Malibu

I have added lots of similar examples under the related intent.

And I have locations in a lookup file. So I have added samples for locations under intent “inform” and mentioned path of the file as a lookup. Even so, bot fails to identify locations that are mentioned in the lookup, and some of the locations are classified as other intents.

I looked for a solution online and tried adding more samples (>100) in “inform”. With this, although several locations are identified, still the issue persists for few locations.

There are 20k location names in the lookup file. Is size is the problem or can I fix it anyway?

Hi @ManiNuthi. If you are looking to identify source and destination cities, I would recommend looking entity roles and groups:

This is cool. But the error still persists.

I changed my samples from " I want to travel from [NYC](Source) to [LA](Destination)" to "I want to travel from [NYC]{"entity": "location", "role": "Source"} to [LA]{"entity": "location", "role": "Destination"}.

slot mapping:
    def slot_mappings(self) -> Dict[Text, Union[Dict, List[Dict]]]:
        return {
        "Source": [self.from_entity(entity="location", role="Source")],
         "Destination": [self.from_entity(entity="location", role="Destination")]
         }
 
And in order to connect with lookup, under the intent 'inform' I placed some samples similar to:
 [new york](location)
 [brroklyn](location)

And also tried in another format: 
[new york]{"entity": "location", "role": "Destination"}
[new york]{"entity": "location", "role": "Source"}

But still, the error appears.

This is my NLP pipeline:
language: te
pipeline:
  - name: SpacyNLP
  - name: SpacyTokenizer
  - name: CountVectorsFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: "char_wb"
    min_ngram: 1
    max_ngram: 4
  - name: RegexFeaturizer
  - name: EntitySynonymMapper
  - name: DIETClassifier
    epochs: 130

@ManiNuthi Using entity roles and groups definitely makes sense in your case. Can you please run your bot in debug mode (e.g. using the flag --debug), chat with it a bit and share the logs afterwards here? Thanks. That should help us to figure out what the bot is actually recognising and where the error might be.

Here I attached logs for conversation(in telugu language).

The conversation goes like this:

input: hi

bot: how can I help you

input: search train from kadiri to palasa.

logs of shell --debug:

Here are logs for a sample where it worked. input : search trains from tirupati to anakapalle.

@ManiNuthi Sorry for the late reply. Did you already fixed the issue?

According to the slots the slot mapping does not seem to be the issue. The entity extractor is simply not able to identify the entities. This might have several reasons: (1) You training data is not consistently annotated - make sure that all entities are labelled. (2) Or you have not enough examples - however you already mentioned that you added some more examples, so might not be a problem. How much entities do you have and how much examples per entity? (3) Or the pipeline is not ideal - you could try playing around with different components, parameter options. (4) Or the tokenization does not always work as expected - I am not familiar with telugu language but I guess that the tokenizer from spacy does a good job, but might be worth to double check this. Sorry, I cannot give you any better advise, but it is hard to figure our what is exactly going wrong without taking a closer look at the data.