Issue on entity detection using lookup table and entitysynonymmapper together

Hello guys,

In my chatbot project I am currently using a lookup table and synonyms together to improve entity detection for one specific entity.

This entity is called categories and simply consists of about 20 product categories relevant to the users. Now I want to make it as robust as possible detecting these categories so I try to include a bunch of synonyms with different spellings, partial synonyms etc.

I learned that you have to include those synonyms in the lookup table aswell so I did so. But I still encounter some issues with the entity matching especially when those categories consist of multiple words.

Example: One category is called “Connected Life Services”. One synonym I included is “life services”. I included this synonym in the lookup table, as well as in the nlu data as:

synonym:Connected Life Services

  • life services

Now if I talk to my bot and I use the synonym life services in a sentence that I have similar training data for, it goes ahead and detects “life” as the entity instead of “life services” and then does not correct this via synonyms and so the entity is mapped as “life” instead of “Connected Life Services”.

My guess is that the lookup table words get matched individually and so it mappes “life”, but then searches for “life” in synonyms and since it is only mentioned as “life services” doesnt find anything so it doesnt get replaced with the correct entity.

Is there any way to go about this differently than just having those single words as synonyms aswell? In this example “life” is a bit too generalistic to be added to lookup tables and synonyms, so I would prefer if it only gets matched in connection with the services part.

Hi @isic5,

Your approach is correct. Are you annotating the entities in your sentences?

I did a little test using your description, and it is working well for me. This is what I tested, using the mood bot, created with the command rasa init as the basis:

file: domain.yml

entities:
  - categories

file: nlu.md

## intent:ask_about_categories
- Do you have any [life services](categories)?
- Where are the [Connected Life Services](categories)?

## lookup:categories
- Connected Life Services
- life services

## synonym:Connected Life Services
- life services

I then train the model, and when I try this sentence What about life services? it all looks OK:

$ rasa shell nlu
NLU model loaded. Type a message and press enter to parse it.
Next message:
What about life services?
{
  "intent": {
    "name": "ask_about_categories",
    "confidence": 0.8778952956199646
  },
  "entities": [
    {
      "start": 11,
      "end": 24,
      "value": "Connected Life Services",
      "entity": "categories",
      "confidence": 0.7815333891882852,
      "extractor": "CRFEntityExtractor",
      "processors": [
        "EntitySynonymMapper"
      ]
    }
  ],
  "intent_ranking": [
    {
      "name": "ask_about_categories",
      "confidence": 0.8778952956199646
    },
    ...
  ],
  "text": "What about life services?"
}

Hi @Arjaan, thanks for looking into it!

What I did not do compared to you in my nlu data is add sentences with the synonym spellings as well. So far I only included examples with the correct categories. This might be the issue then, but I assumed that by adding the synonyms with:

##synonym: Would already be enough.

Just to avoid unnecessary work. Does that mean I have to add example sentences, using all the synonyms I used in the lookup table and nlu file so far, to make it actually robust? It kind of feels like Im doing double the work as I dont see why the lookup tables would even be necessary in that case? Thanks again for spending the time looking into my issue, much appreciated.

Edit:Another question I am using mostly rasa x for development so far, how can I get an output like yours with processors used etc? Can I just get it by using rasa shell, and if so any additional commands that need to be activated? Thanks :slight_smile:

Hi @isic5,

The example I made was a bit misleading. I indeed had training data for all the elements in the lookup table, so in that case the lookup table is a bit useless. The whole point of a lookup table is that you do NOT need to include every element in the training data. You need to include some elements in the training data though for it to work properly.

This blog post with corresponding github repo explains in more detail how lookup tables works, when to use them and also what the potential pitfalls are. Your use case is a very good candidate for lookup tables.

Let me know if you get it all to work properly.


Also, I will check on your question about rasa x, and I will post another update here.

@Arjaan hmm, this is good and bad news then, because it brings me back to the original issue in that I think I covered all the required steps, as in the lookup table has all the synonyms, my nlu has all the synonyms and a few training examples and I’m still getting the missmatching of entites as mentioned in my first post. Is there any specific data or info I can post here for you that would help identifieng where Im going wrong? I can of course post my files, but that might be a bit much too look through.

Anyhow appreciate your input so far! Meantime I will add some more nlu examples and see if that helps.

Hi @Arjaan,

I have a similar issue. I checked the rasa forum, and couldn’t find a certain solution. I have apxx 1000 different values for one of my entities. Values are task names, so they include multi-words and generally regular words. I created a lookup table to cover all, but the lookup table doesn’t work for values with multiple words. Is there any solution for this problem?

upgrade to rasa==2.0.0 and use RegexEntityExtractor