Unable to classify multiple examples of the same entity. Please help

This is an example of what my training data looks like

## intent:ask_ingredients

- my ingredients are [tomato](ingredients)  [bacon](ingredients)  [berries](ingredients) 

- my ingredints are [beans](ingredients)  and [baguette](ingredients) 

- i have [banana](ingredients)  [baking bar](ingredients)  and [beer](ingredients) 

- in my fridge there is [chia seeds](ingredients)  and [chestnuts](ingredients) 

- i have ingredients that are [cheese](ingredients)  [cereal](ingredients) 

- my ingredients are [yogurt](ingredients)  [bacon](ingredients)  [berries](ingredients) 

- my ingredients are [wine](ingredients)  [bacon](ingredients)  [berries](ingredients) 

- my ingredients are [walnuts](ingredients)  [raw shrimp](ingredients)  [berries](ingredients) 

- my ingredients are [turkey](ingredients)  [bacon](ingredients)  [berries](ingredients) 

- my ingredients are [toast](ingredients)  [red wine](ingredients)  [berries](ingredients) 

- my ingredients are [steaks](ingredients)  [bacon](ingredients)  [berries](ingredients) 

- my ingredients are [sugar](ingredients)  [bacon](ingredients)  [berries](ingredients) 

- my ingredients are [spaghetti](ingredients)  [bacon](ingredients)  [berries]](ingredients) 

- my ingredients are [seeds](ingredients)  [pumpkin](ingredients)  [berries](ingredients) 

- my ingredients are [salsa](ingredients)  [bacon](ingredients)  [pistachios](ingredients) 

- my ingredients are [wine](ingredients)  [bacon](ingredients)  [berries](ingredients)

- my ingredients are [walnuts](ingredients)  [raw shrimp](ingredients)  [berries](ingredients) 

- my ingredients are [turkey](ingredients)  [bacon](ingredients)  [berries](ingredients) 

- my ingredients are [toast](ingredients)  [red wine](ingredients)  [fish](ingredients) 

- my ingredients are [steaks](ingredients)  [bacon](ingredients)  [garlic](ingredients) 

- my ingredients are [sugar](ingredients)  [bacon](ingredients)  [ginger](ingredients) 

- my ingredients are [spaghetti](ingredients)  [bacon](ingredients)  [berries](ingredients) 

- my ingredients are [seeds](ingredients)  [pumpkin](ingredients)  [berries](ingredients) 

- my ingredients are [lemon](ingredients)  [ketchup](ingredients)  [juice](ingredients)

This is an example of the output I am getting

my ingredients are bacon berries cheese
{
  "intent": {
    "name": "ask_ingredients",
    "confidence": 0.9999998807907104
  },
  "entities": [
    {
      "entity": "ingredients",
      "start": 19,
      "end": 39,
      "value": "bacon berries cheese",
      "extractor": "DIETClassifier"
    },
    {
      "entity": "ingredients",
      "start": 19,
      "end": 39,
      "confidence_entity": 0.7563967162518996,
      "value": "bacon berries cheese",
      "extractor": "CRFEntityExtractor"
    }
  ],

I tried both DIET and CRF but both of them are giving me the same result. Why are my entities not being recognized separately as 3 ingredients ?

What could I do to make it such that it is classified correctly ?

I can’t say I know what is happening internally. I observe that there are two spaces in your training data instead of one, so that might be causing some confusion. But this does look like strange behavior.

I can point you towards something that might help in the meantime though: lookup tables!

Here’s an example from the pokedex demo.

## intent:confirm_exists
- is [bulbasaur](pokemon_name) a pokemon
- does [ninetails](pokemon_name) exist
- ever heard of [pikachu](pokemon_name)

## lookup:pokemon_name
  data/pokenames.txt

The idea is that the textfile contains a long list of things to match against and this may make it easier to detect the right ingredients.

Thanks for the response @koaning

I am definitely using lookup tables in places where I have entities that have a discrete set of values they can take. I didn’t think of using it here but I will look into the possibility.

Could you tell me if this behavior is expected or is it expected to be able to pick up each entity separately?

@akelad @Tanja would you have any ideas about this ?

In demos that I have done with that pokedex bot I’ve seen that it picks up the entities seperately, albeit with the word and in between as a seperator. It does occur to me as unexpected behavior but I cannot immediately pin-point what is causing it.

This behaviour is actually expected. If you have multiple tokens next to each other and they all have the same predicted entity type, we assume that they actually belong together. For example if you have an entity extractor that extracts cities, you want “San Fransisco” to be classified as one city and not as two citifies “San” and “Fransisco”. So, if you add the word and to your sentence (in-between the ingredients), the ingredients are picked up separately.

@noman here’s the post i was referring to, it is expected behavior

Thanks @shubhamnatraj for pointing me to this. @Tanja but this default behavior seems does not handle this scenario
Received user message 'please add buffalo, ranch,mustard, barbeque sauces' with intent '{'name': 'inform', 'confidence': 0.8117992877960205}' and entities '[{'entity': 'sauces', 'start': 11, 'end': 43, 'value': 'buffalo, ranch,mustard, barbeque', 'extractor': 'DIETClassifier'}]'

Here we want to get get them as four separate “sauces” entities i.e. ‘buffalo’ ‘ranch’, ‘mustard’ and ‘barbecue’ instead of one single entity as ‘buffalo, ranch,mustard, barbeque’.

How can we achieve this? Thanks

@noman You are right, separating entities by comma does not work at the moment. Will create a fix for it soon. For now please use and.

1 Like