Is it possible to use DIETClassifier with lookup table to extract entities?

is it possible to use DIETClassifier with lookup table to extract entities?

i saw the doc that RegexEntityExtractor + RegexFeaturizer that i can work with lookup table.

I also searched in this forum. it seems it can. but i don’t think the accuracy has been improved.

but when i use it with RegexEntityExtractor it works fine and help a lot.

but if i use both entity extractors(RegexEntityExtractor and DIETClassifier), they both extract entities that inducing duplicate entities. for example, [coke, coke] if both extractors successfully extract.

so is it possible to use DIETClassifier only?

because i have something else need to be extracted that don’t need lookup table.

DIETClassifier does not use the regexes you define in your training data directly as the RegexEntityExtractor. Instead DIETClassifier uses the features created by the RegexFeaturizer alongside all the other features present. DIETClassifier itself how much it will use these kind of features. So it might be that the features from the RegexFeaturizer do not have a big impact on the performance. The RegexEntityExtractor uses the regexes directly to extract entities.

If the combination RegexFeaturizer and DIETClassifier is not working as expected, you might need to add more examples to your training data that contains the entities related to the regexes. You can also use the RegexEntityExtractor for the entities you have regexes for and use the DIETClassifier for all additional entities.

I tried to use different entity extractors for specific entities. But how can I do that?

If you want to extract certain entities via the RegexEntityExtractor just add one training example for this particular entity to your NLU data. For entities that should be extracted via the DIETClassifier you need to add more examples. It’s a bit hacky, but that should work.

it doesn’t seem to work. even though I provide only one example of the entity which also appears in the lookup table.

both extractors can also extract this entity.

@carlhung Did you solve the issue? I have a similar problem. language: en

config.yml

pipeline:
  - name: WhitespaceTokenizer
  - name: CountVectorsFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: char_wb
    min_ngram: 1
    max_ngram: 4
  - name: DIETClassifier
    epochs: 100
  - name: EntitySynonymMapper
  - name: RegexEntityExtractor
  - name: ResponseSelector
    epochs: 100
  - name: FallbackClassifier
    threshold: 0.3
    ambiguity_threshold: 0.1

nlu.yml

version: "2.0"
nlu:
- intent: check_balance
  examples: |
    - how much do I have on my [savings account]("account") 
    - how much money is in my [checking account]("account")  
    - What's the balance on my [credit card account]("account") 
    - How much money is on [1234567890]("account_number") 
- regex: account_number
  examples: |
    - \d{10,12}

When I train I get the following warning: UserWarning: No lookup tables or regexes defined in the training data that have a name equal to any entity in the training data. In order for this component to work you need to define valid lookup tables or regexes in the training data.

There clearly is an example for account_number in the training data. So I don’t know what is going on. Hope we can make this work. I am really looking forward to combining ML with a regex rule-based system!! :star_struck:

How can I mention which extractor to be used for specific entities? could that be defined during enitity declaration in domain file?

I need to assign roles in my entites. But at present roles and groups can only be used using DIETClassifier. I also need to use lookup tables which are only supported by RegexEntityExtractor. Please help me understand which entity extractor to use to work with both lookup tables and also assign roles to entities.