How to exclude certain entities from DIETClassifier (or prioritise based on confidence)

I am using RegexEntityExtractor to extract certain entities (amount / account_number), but they end up going through DIETClassifier and I’m not sure why. Here is my regex / nlu.yml for amount:

 regex: amount
    examples: |
      - \b\d{1,6}(\.\d{1,2})?\b
  - intent: inform
    examples: |
      - [1,000](amount) [dólares]{"entity":"currency", "value":"USD"}

(plus more examples

Here is my regex / nlu.yml for account_number:

nlu:
  - regex: account_number
    examples: |
      - (?=(?:\D*\d){7,18}\D*)([\d\s-]{7,40})
  - intent: inform
    examples: |
      - [2242171377602651](account_number)
      - [2334234324543243](account_number)

plus more examples.

The problem is that, certain messages like “my account number is 12345678” are not. being extracted via the Regex, but via the DIETClassifier?

this is part of my pipeline:

  • name: RegexFeaturizer case_sensitive: true use_word_boundaries: true
  • name: LexicalSyntacticFeaturizer
  • name: CountVectorsFeaturizer
  • name: CountVectorsFeaturizer analyzer: word
  • name: RegexEntityExtractor case_sensitive: false use_lookup_tables: true use_regexes: true use_word_boundaries: true confidence: 1.0
  • name: DIETClassifier constrain_similarities: true excluded_entities:
    • amount
    • account_number

What am i doing wrong? why are messages that contain the regex, end up going to the DIETClassifier instead of the regex? For context, what I want to achieve is:

  1. Try the regex. if it finds something, take that as the entity.
  2. If not, go through the DIETClassifier. I thought Rasa already by default picks the higher confidence extraction between regex and DIET, but it seems the regex isn’t even working in the first place?

Update for anyone else struggling with this: exactly one example for the regex entities, and removing all other annotations worked for me