Use DIETClassifier with custom rule-based entity extractor

Hello All,

I’m using the DIETClassifier to do then NLU prediction (entities+entities). But I do have some custom components used to extract entities separately (most of them are rule-based extractor or entity transformer, like merge multiple date entities into a date_range entity which contains a list of dates as value…)

Knowing that in the pipeline, the custom components should come after ML prediction (DIET or CRF) as some of them need to read list of predicted entities.

For now, my solution is to disable entities prediction from DIET, by adding CRF component right before it: CRF -> custom entities extractor -> DIET (intent prediction only)

Is this the best sulution for my case? I’m wondering if I can put my custom entities extracor in the middle of a nomal DIET to benefit from the pre-trained model: DIET(CRF+custom component => intent prediction)

I’m using the DIETClassifier to do then NLU prediction (entities+entities).

Just to check, do you mean (intents + entities) here?

In your case, have you played around with the rasa shell nlu command? Here’s what I get when I run a toy model locally that can detect programming languages.

> rasa shell nlu
>> I want to talk python 
{
  "intent": {
    "name": "talk_code",
    "confidence": 0.8869256973266602
  },
  "entities": [
    {
      "entity": "proglang",
      "start": 15,
      "end": 21,
      "value": "python",
      "extractor": "DIETClassifier"
    }
  ],
  "intent_ranking": [
    {
      "name": "talk_code",
      "confidence": 0.8869256973266602
    },
    {
      "name": "bot_challenge",
      "confidence": 0.10142503678798676
    },
    {
      "name": "goodbye",
      "confidence": 0.00817133765667677
    },
    {
      "name": "greet",
      "confidence": 0.0034779126290231943
    }
  ],
  "text": "i want to talk python"
}

You’ll notice that the extractor is also listed. That means that with whatever pipeline you have. You should be able to check which entities were found and also which components extracted them.

I think adding you custom component after DIET should be fine by the way. But it’s good to double check on your dataset.

It might be worth to mention that in the 2.0 release of Rasa we expect a RegexEntityDetector to be added to our toolkit. That should make it much easier for you to add some “rule-based” entities to the pipeline.

Hi koaning,

thanks for your response.

I’ve no doubt that entities will be extracted no matter where the custom rule-based extractor are putted in the pipeline.

But by reading architecture of DIETClassifier, I believe that the entities in the message will impact the result of intent prediction, so if I put custom component after DIET, these entities will not be taken into account during the intent prediction, am I right?

Ah. I understand the confusion now.

If you use DIET for both entity and intent detection then they will influence each other. But only then. If you use CRF then this will run in parallel and will be ignored by DIET.

So there is no way to “modify” the output of CRF layer in DIET?

Because I use custom components to normalize some date entities (i.e. convert “last Monday” into “yyyy-mm-dd” format, or “next week” into a list of dates like [d0, d1, d2…d6]), so those entities can be easily handled by dialogue stories and keep actions implementation clean.

Another reason of custom components is of cause use regex to extract rule-based entities in high precision.

Since EmbeddingIntentClassifier will be remove in version 2.0, I really need a work around solution to put the entities extracted by the custom components into intent classifier.

I think the features you’re interested in can be fetched with Duckling too. Have looked into that entity extractor? That can just run parallel to DIET safely. The Duckling entities won’t influence DIET.