The "two entity extractor" problem - do I really need to write custom code for stories?

pomegran · July 28, 2021, 8:28am

Hi,

I have been working a lot with forms and custom extract/validation code to extract values. DIETs entity extraction alongside others has been very useful. However, documentation pushes “stories” a lot so I have been looking into trying to use stories to manage conversational flows instead.

The problem comes with dual entity extraction. For example, I want to use lookup and synonyms (i.e. in essence using Regex) but to get this to extract, I must give some training examples in my intents which means I always get 2 entities extracted with the same name - one for DIET, one for Regex.

On reading the documentation, do I have write some custom code to somehow get around this so I can use them in stories? Seems a lot of work when I want to use simple lookup features.

Or am I missing a trick? Just want to be sure before I start having to write custom code for stories.

Below is my simple domain extract to show what I mean:

- intent: which_car
  examples: |
    - I want to buy a car
    - I wanna purchase a car
    - Get me a [red](colour) car
    - I wanna buy a [blue](colour) car
    - I want a [rouge](colour) car
    - Get me an [aqua](colour) car
    - i want to buy a [red](colour) car
- intent: colour
  examples: |
    - [blue](colour)
    - [red](colour)
- synonym: red
  examples: |
    - rouge
- synonym: blue
  examples: |
    - aqua
- lookup: colour
  examples: |
    - red
    - blue
    - green

Thanks for any guidance!

amn41 · July 29, 2021, 2:26pm

Hi Mark - just a very quick thing to check. Do you need DIET to extract non-lookup entities too? If not, you can just set entity_recognition to false in the DIET config.

But what’s not clear to me from your question is what the issue is with stories. What is the incompatibility you are seeing here?

pomegran · July 29, 2021, 4:02pm

Hi Alan,

Unfortunately (and selfishly) I need both I really like the DIET (or CRF) approach for entity extraction. I’ve already used this for some really nice use cases with external entities and cosine lookups for entity disambiguation. So in essence I need DIET entities “on” within my solution.

Therefore using the above example. if I say “i wanna get a blue car”, I get these extracted within the NLU (correctly):

"entities": [
{
  "entity": "colour",
  "start": 14,
  "end": 18,
  "value": "blue",
  "extractor": "RegexEntityExtractor"
},
{
  "entity": "colour",
  "start": 14,
  "end": 18,
  "confidence_entity": 0.9987943172454834,
  "value": "blue",
  "extractor": "DIETClassifier"
}

]

How do I define the story to pick up the Regex extraction only? In some cases I’ll even get a synonym lookup e.g. “i’m thinking of buying a rouge car”:

"entities": [
{
  "entity": "colour",
  "start": 25,
  "end": 30,
  "confidence_entity": 0.997975766658783,
  "value": "red",
  "extractor": "DIETClassifier",
  "processors": [
    "EntitySynonymMapper"
  ]
}

]

Which again is correct. So using lookups I can get any one of 2 types of entity (colour) with the same name - Regex or DIET.

I also see this as a warning:

UserWarning: Parsing of message: 'i wanna get a blue car' lead to overlapping entities: blue of type colour extracted by RegexEntityExtractor overlaps with blue of type colour extracted by DIETClassifier. This can lead to unintended filling of slots. Please refer to the documentation section on entity extractors and entities getting extracted multiple times:https://rasa.com/docs/rasa/components#entity-extractors

And this “overlapping” is referred to in the documentation too:

So maybe a solution should only ever use either DIET or Regex/Lookups? Can I not mix the two without “coding”?

Again, your advice is really much appreciated!

Mark

koaning · July 30, 2021, 9:30am

Hi,

I’m Vincent and I maintain the rasa nlu examples project. I just added an issue on Github to explore ways of addressing this. I’m thinking about adding a NLU component that can do a bit of post-processing on all the detected entities. The working title for the component is EntityOrchestrator but there’s a couple of different ways of going about it.

If you’d like to give feedback on what would/would not work for you, I’d be all ears!

Topic		Replies	Views
Double entity extraction using DIETClassifier & RegexEntityExtractor Rasa Open Source	4	1134	May 7, 2021
Entity being extracted by multiple entity extractors breaks testing Rasa Open Source	3	555	July 19, 2021
Lookup Table not working for DIET Classifier + RegexFeaturizer Rasa Open Source	10	2118	June 29, 2021
Extracting entity separated by space Feedback on Rasa Open Source rasa	7	756	July 8, 2021
Regex with DIET classifer Rasa Open Source	0	156	February 6, 2024

The "two entity extractor" problem - do I really need to write custom code for stories?

Related topics