Use DIETClassifier with custom rule-based entity extractor

jeanveau · August 18, 2020, 8:39am

Hello All,

I’m using the DIETClassifier to do then NLU prediction (entities+entities). But I do have some custom components used to extract entities separately (most of them are rule-based extractor or entity transformer, like merge multiple date entities into a date_range entity which contains a list of dates as value…)

Knowing that in the pipeline, the custom components should come after ML prediction (DIET or CRF) as some of them need to read list of predicted entities.

For now, my solution is to disable entities prediction from DIET, by adding CRF component right before it: CRF -> custom entities extractor -> DIET (intent prediction only)

Is this the best sulution for my case? I’m wondering if I can put my custom entities extracor in the middle of a nomal DIET to benefit from the pre-trained model: DIET(CRF+custom component => intent prediction)

koaning · August 19, 2020, 1:29pm

I’m using the DIETClassifier to do then NLU prediction (entities+entities).

Just to check, do you mean (intents + entities) here?

In your case, have you played around with the rasa shell nlu command? Here’s what I get when I run a toy model locally that can detect programming languages.

> rasa shell nlu
>> I want to talk python 
{
  "intent": {
    "name": "talk_code",
    "confidence": 0.8869256973266602
  },
  "entities": [
    {
      "entity": "proglang",
      "start": 15,
      "end": 21,
      "value": "python",
      "extractor": "DIETClassifier"
    }
  ],
  "intent_ranking": [
    {
      "name": "talk_code",
      "confidence": 0.8869256973266602
    },
    {
      "name": "bot_challenge",
      "confidence": 0.10142503678798676
    },
    {
      "name": "goodbye",
      "confidence": 0.00817133765667677
    },
    {
      "name": "greet",
      "confidence": 0.0034779126290231943
    }
  ],
  "text": "i want to talk python"
}

You’ll notice that the extractor is also listed. That means that with whatever pipeline you have. You should be able to check which entities were found and also which components extracted them.

I think adding you custom component after DIET should be fine by the way. But it’s good to double check on your dataset.

It might be worth to mention that in the 2.0 release of Rasa we expect a RegexEntityDetector to be added to our toolkit. That should make it much easier for you to add some “rule-based” entities to the pipeline.

jeanveau · August 19, 2020, 4:07pm

Hi koaning,

thanks for your response.

I’ve no doubt that entities will be extracted no matter where the custom rule-based extractor are putted in the pipeline.

But by reading architecture of DIETClassifier, I believe that the entities in the message will impact the result of intent prediction, so if I put custom component after DIET, these entities will not be taken into account during the intent prediction, am I right?

koaning · August 19, 2020, 5:43pm

Ah. I understand the confusion now.

If you use DIET for both entity and intent detection then they will influence each other. But only then. If you use CRF then this will run in parallel and will be ignored by DIET.

jeanveau · August 19, 2020, 9:17pm

So there is no way to “modify” the output of CRF layer in DIET?

Because I use custom components to normalize some date entities (i.e. convert “last Monday” into “yyyy-mm-dd” format, or “next week” into a list of dates like [d0, d1, d2…d6]), so those entities can be easily handled by dialogue stories and keep actions implementation clean.

Another reason of custom components is of cause use regex to extract rule-based entities in high precision.

Since EmbeddingIntentClassifier will be remove in version 2.0, I really need a work around solution to put the entities extracted by the custom components into intent classifier.

koaning · August 21, 2020, 8:06am

I think the features you’re interested in can be fetched with Duckling too. Have looked into that entity extractor? That can just run parallel to DIET safely. The Duckling entities won’t influence DIET.

Topic		Replies	Views
Custom Sentiment Analyzer component based on DIET Rasa Open Source	1	532	November 3, 2021
How does the CRF for entity extraction work? Best practices? Rasa Open Source	13	2652	June 3, 2022
How to turn off Diet classifier for Entity recognition? Rasa Open Source	2	2231	June 20, 2020
Using the CRFEntityExtractor with the DIETClassifier Rasa Open Source	16	5500	July 22, 2024
Setup data for entity group recognition Rasa Open Source	1	513	September 23, 2020

Use DIETClassifier with custom rule-based entity extractor

Related topics