Has anyone successfully implemented strict regex patterns for entity extraction?

aaronlikesrasa · June 27, 2023, 3:47pm

Hi everyone, I see a lot of posts of people struggling to use regex expressions for entity extraction. I am trying to use a regex expression for recognizing an id number for customers. However, it can only recognize those patterns that are very similar to my training data in NLU. this is my pipeline:

name: WhitespaceTokenizer
name: RegexFeaturizer
name: LexicalSyntacticFeaturizer
name: CountVectorsFeaturizer
name: CountVectorsFeaturizer analyzer: char_wb min_ngram: 1 max_ngram: 4
name: CRFEntityExtractor
name: DIETClassifier epochs: 100 constrain_similarities: true entity_recognition: False
name: RegexEntityExtractor
name: EntitySynonymMapper
name: ResponseSelector epochs: 100 constrain_similarities: true

and an example of my training data (over 50 entries):

- intent: id_search
  examples: |
    - search with id [LA5678123](id_number)
    - [AN2915998](id_number)
    - Retrieve records based on the id [823RQ042345](id_number)
- regex: id_number
  examples: |
    - ^[A-Z\d]{9,12}$

Please if anyone has any input on how to get the model to extract the id number as long as it matches the regex expression that would be extremely helpful.

stephens · July 3, 2023, 4:19pm

Try: \b\w{9,12}\b

Topic		Replies	Views
How to use regex patterns for entity recognition? Rasa Open Source	4	5029	December 4, 2022
Rasa regex Rasa Open Source	5	637	February 23, 2022
Regex: Unable to extract correct entity according to Regex Rasa Open Source	4	1617	February 21, 2022
Need clarity RASA Regex Rasa Open Source	3	974	September 9, 2019
Entities can't get extracted with regex Rasa Open Source	18	1204	January 18, 2022

Has anyone successfully implemented strict regex patterns for entity extraction?

Related topics