Extracting entity separated by space

Hi suppose I have entity name like - Space Warriors
I want the entity to be detected as [Space Warriors] and not just [Space]

What would be the best way to achieve this?

Any help would be appreciated. Thank you.

As long as I remember, DIET should be able to handle spaces between tokens. You need provide training data with tagged entities in your nlu.md

If it is a limited list then perhaps best to also use lookup tables.

I tried using look up tables, it doesnt work as expected.

what goes wrong? Usually lookup tables uses Regex Featurizer so you should provide example of the pattern of sentences that might have items mentioned in your lookup table

My NLU:

  • intent: intent_lookup examples: |

    • details for [Google]{“entity”: “company”}
    • [Google]{“entity”: “company”}
  • lookup: company examples: |

    • Das capitals
    • J P Morgan
    • Your Story
    • Linked In
    • Infineon Technologies
    • Tata Steel
    • Morgan Stanley
    • App developer studio
    • Moonfrog Labs
    • Hacker Earth

STORIES

  • story: company_lookup steps:
    • intent: intent_lookup
    • action: utter_lookup

I have mentioned the intent and entities in Domain file.

When I type the names in lookup table I am either getting the first work or second word.

you need to provide examples from the lookup table into your training data. Google is quite simple, perhaps use companies with more than one token as an example. i know you don’t have to give the list of all companies as examples but some would help DIET learn better the token positions

you might be using Lexical Features

Take a look at this to understand how

1 Like

If I am using Lookups, do I need to make any changes to config.yml? If yes then what would those changes be?

it is sort of fine tuning. usually the default params in the config should work for you. but if you have to fine tune your models to get better in extracting entities, perhaps it is best to tune lexical features and see how it impacts your model.

I would say for lookups, provide more examples to help learn the regex features because it creates patterns.