Regex matching for entities vs. featurizer

I trained a DIET model with these regexes in my NLU.md file, with the RegexFeaturizer turned on in my pipeline, however, my bot didn’t understand a certain year that matches the regex (four digits).

Is the regex information in the nlu.md file only for interpretation by the RegexFeaturizer, or is there a way to turn on exact matching entities according to the regex with higher priority than the DIET / ML model?

Use word boundaries \b[0-9]{4}\b around your regexes to improve them. With your regexes the year will match two times with any zipcode.

Do you have training examples with years in your training data?

I do have many examples. DIET found 1980 but not 2334.

An update: This is still an issue for me:

Here are my regex patterns:

## regex:yearBorn
- \b[0-9]{4}\b

## regex:zipCode
- \b[0-9]{5}\b

Rasa still makes interpretations that confuse these entities:

language: en
pipeline:
- name: ConveRTTokenizer
  intent_tokenization_flag: true
  intent_split_symbol: +
- name: ConveRTFeaturizer
- name: RegexFeaturizer
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer
  analyzer: char_wb
  min_ngram: 1
  max_ngram: 4
- name: DIETClassifier
  epochs: 30
  num_transformer_layers: 4
  transformer_size: 256
  use_masked_language_model: false
  drop_rate: 0.25
  weight_sparsity: 0.7
  batch_size:
  - 32
  - 128
  embedding_dimension: 30
  hidden_layer_sized:
    text:
    - 512
    - 128
- name: DucklingHTTPExtractor
  url: http://localhost:8000
  dimensions:
  - time
  - number
  locale: en_US
  timezone: US/Pacific
  timeout: 3
- name: EntitySynonymMapper

Hi @argideritzalpea. Did u solve it?

If not could try removing - name: LexicalSyntacticFeaturizer and see?