Regex matching for entities vs. featurizer

argideritzalpea · April 21, 2020, 12:10am

I trained a DIET model with these regexes in my NLU.md file, with the RegexFeaturizer turned on in my pipeline, however, my bot didn’t understand a certain year that matches the regex (four digits).

Is the regex information in the nlu.md file only for interpretation by the RegexFeaturizer, or is there a way to turn on exact matching entities according to the regex with higher priority than the DIET / ML model?

IgNoRaNt23 · April 21, 2020, 5:06am

Use word boundaries \b[0-9]{4}\b around your regexes to improve them. With your regexes the year will match two times with any zipcode.

Do you have training examples with years in your training data?

argideritzalpea · April 21, 2020, 4:37pm

I do have many examples. DIET found 1980 but not 2334.

argideritzalpea · August 12, 2020, 2:57pm

An update: This is still an issue for me:

Here are my regex patterns:

## regex:yearBorn
- \b[0-9]{4}\b

## regex:zipCode
- \b[0-9]{5}\b

Rasa still makes interpretations that confuse these entities:

language: en
pipeline:
- name: ConveRTTokenizer
  intent_tokenization_flag: true
  intent_split_symbol: +
- name: ConveRTFeaturizer
- name: RegexFeaturizer
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer
  analyzer: char_wb
  min_ngram: 1
  max_ngram: 4
- name: DIETClassifier
  epochs: 30
  num_transformer_layers: 4
  transformer_size: 256
  use_masked_language_model: false
  drop_rate: 0.25
  weight_sparsity: 0.7
  batch_size:
  - 32
  - 128
  embedding_dimension: 30
  hidden_layer_sized:
    text:
    - 512
    - 128
- name: DucklingHTTPExtractor
  url: http://localhost:8000
  dimensions:
  - time
  - number
  locale: en_US
  timezone: US/Pacific
  timeout: 3
- name: EntitySynonymMapper

Akhil · September 4, 2020, 3:08pm

Hi @argideritzalpea. Did u solve it?

If not could try removing - name: LexicalSyntacticFeaturizer and see?

Topic		Replies	Views
Rasa regex Rasa Open Source	5	651	February 23, 2022
Help in using regex feature in rasa_nlu Rasa Open Source	10	3311	December 11, 2018
How to use regex patterns for entity recognition? Rasa Open Source	4	5215	December 4, 2022
How to exclude certain entities from DIETClassifier (or prioritise based on confidence) Rasa Open Source	1	33	January 22, 2025
Regex with DIET classifer Rasa Open Source	0	157	February 6, 2024

Regex matching for entities vs. featurizer

Related topics