The "two entity extractor" problem - do I really need to write custom code for stories?

Hi,

I have been working a lot with forms and custom extract/validation code to extract values. DIETs entity extraction alongside others has been very useful. However, documentation pushes “stories” a lot so I have been looking into trying to use stories to manage conversational flows instead.

The problem comes with dual entity extraction. For example, I want to use lookup and synonyms (i.e. in essence using Regex) but to get this to extract, I must give some training examples in my intents which means I always get 2 entities extracted with the same name - one for DIET, one for Regex.

On reading the documentation, do I have write some custom code to somehow get around this so I can use them in stories? Seems a lot of work when I want to use simple lookup features.

Or am I missing a trick? Just want to be sure before I start having to write custom code for stories.

Below is my simple domain extract to show what I mean:

- intent: which_car
  examples: |
    - I want to buy a car
    - I wanna purchase a car
    - Get me a [red](colour) car
    - I wanna buy a [blue](colour) car
    - I want a [rouge](colour) car
    - Get me an [aqua](colour) car
    - i want to buy a [red](colour) car
- intent: colour
  examples: |
    - [blue](colour)
    - [red](colour)
- synonym: red
  examples: |
    - rouge
- synonym: blue
  examples: |
    - aqua
- lookup: colour
  examples: |
    - red
    - blue
    - green

Thanks for any guidance!

Hi Mark - just a very quick thing to check. Do you need DIET to extract non-lookup entities too? If not, you can just set entity_recognition to false in the DIET config.

But what’s not clear to me from your question is what the issue is with stories. What is the incompatibility you are seeing here?

Hi Alan,

Unfortunately (and selfishly) I need both :slight_smile: I really like the DIET (or CRF) approach for entity extraction. I’ve already used this for some really nice use cases with external entities and cosine lookups for entity disambiguation. So in essence I need DIET entities “on” within my solution.

Therefore using the above example. if I say “i wanna get a blue car”, I get these extracted within the NLU (correctly):

"entities": [
{
  "entity": "colour",
  "start": 14,
  "end": 18,
  "value": "blue",
  "extractor": "RegexEntityExtractor"
},
{
  "entity": "colour",
  "start": 14,
  "end": 18,
  "confidence_entity": 0.9987943172454834,
  "value": "blue",
  "extractor": "DIETClassifier"
}

]

How do I define the story to pick up the Regex extraction only? In some cases I’ll even get a synonym lookup e.g. “i’m thinking of buying a rouge car”:

"entities": [
{
  "entity": "colour",
  "start": 25,
  "end": 30,
  "confidence_entity": 0.997975766658783,
  "value": "red",
  "extractor": "DIETClassifier",
  "processors": [
    "EntitySynonymMapper"
  ]
}

]

Which again is correct. So using lookups I can get any one of 2 types of entity (colour) with the same name - Regex or DIET.

I also see this as a warning:

UserWarning: Parsing of message: 'i wanna get a blue car' lead to overlapping entities: blue of type colour extracted by RegexEntityExtractor overlaps with blue of type colour extracted by DIETClassifier. This can lead to unintended filling of slots. Please refer to the documentation section on entity extractors and entities getting extracted multiple times:https://rasa.com/docs/rasa/components#entity-extractors

And this “overlapping” is referred to in the documentation too:

So maybe a solution should only ever use either DIET or Regex/Lookups? Can I not mix the two without “coding”?

Again, your advice is really much appreciated!

Mark

Hi,

I’m Vincent and I maintain the rasa nlu examples project. I just added an issue on Github to explore ways of addressing this. I’m thinking about adding a NLU component that can do a bit of post-processing on all the detected entities. The working title for the component is EntityOrchestrator but there’s a couple of different ways of going about it.

If you’d like to give feedback on what would/would not work for you, I’d be all ears!