Hi everyone, I see a lot of posts of people struggling to use regex expressions for entity extraction. I am trying to use a regex expression for recognizing an id number for customers. However, it can only recognize those patterns that are very similar to my training data in NLU. this is my pipeline:
- name: WhitespaceTokenizer
- name: RegexFeaturizer
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer analyzer: char_wb min_ngram: 1 max_ngram: 4
- name: CRFEntityExtractor
- name: DIETClassifier epochs: 100 constrain_similarities: true entity_recognition: False
- name: RegexEntityExtractor
- name: EntitySynonymMapper
- name: ResponseSelector epochs: 100 constrain_similarities: true
and an example of my training data (over 50 entries):
- intent: id_search
examples: |
- search with id [LA5678123](id_number)
- [AN2915998](id_number)
- Retrieve records based on the id [823RQ042345](id_number)
- regex: id_number
examples: |
- ^[A-Z\d]{9,12}$
Please if anyone has any input on how to get the model to extract the id number as long as it matches the regex expression that would be extremely helpful.