How to exclude certain entities from DIETClassifier (or prioritise based on confidence)

dahlia96 · January 15, 2025, 12:31pm

I am using RegexEntityExtractor to extract certain entities (amount / account_number), but they end up going through DIETClassifier and I’m not sure why. Here is my regex / nlu.yml for amount:

 regex: amount
    examples: |
      - \b\d{1,6}(\.\d{1,2})?\b
  - intent: inform
    examples: |
      - [1,000](amount) [dólares]{"entity":"currency", "value":"USD"}

(plus more examples

Here is my regex / nlu.yml for account_number:

nlu:
  - regex: account_number
    examples: |
      - (?=(?:\D*\d){7,18}\D*)([\d\s-]{7,40})
  - intent: inform
    examples: |
      - [2242171377602651](account_number)
      - [2334234324543243](account_number)

plus more examples.

The problem is that, certain messages like “my account number is 12345678” are not. being extracted via the Regex, but via the DIETClassifier?

this is part of my pipeline:

name: RegexFeaturizer case_sensitive: true use_word_boundaries: true
name: LexicalSyntacticFeaturizer
name: CountVectorsFeaturizer
name: CountVectorsFeaturizer analyzer: word
name: RegexEntityExtractor case_sensitive: false use_lookup_tables: true use_regexes: true use_word_boundaries: true confidence: 1.0
name: DIETClassifier constrain_similarities: true excluded_entities:
- amount
- account_number

What am i doing wrong? why are messages that contain the regex, end up going to the DIETClassifier instead of the regex? For context, what I want to achieve is:

Try the regex. if it finds something, take that as the entity.
If not, go through the DIETClassifier. I thought Rasa already by default picks the higher confidence extraction between regex and DIET, but it seems the regex isn’t even working in the first place?

dahlia96 · January 22, 2025, 6:50pm

Update for anyone else struggling with this: exactly one example for the regex entities, and removing all other annotations worked for me

Topic		Replies	Views
Entity extraction regexentityextractoe Rasa Open Source	5	361	December 1, 2020
Regex with DIET classifer Rasa Open Source	0	160	February 6, 2024
Lookup Table not working for DIET Classifier + RegexFeaturizer Rasa Open Source	10	2163	June 29, 2021
Is it possible to use DIETClassifier with lookup table to extract entities? Rasa Open Source	9	1895	February 2, 2023
Regex: Unable to extract correct entity according to Regex Rasa Open Source	4	1675	February 21, 2022

How to exclude certain entities from DIETClassifier (or prioritise based on confidence)

Related topics