Hello, I’m trying to extract a phone number pattern using RegexEntityExtractor adding the regex configuration in pipeline this way:
- name: RegexEntityExtractor
case_sensitive: False
use_lookup_tables: False
use_regexes: True
"use_word_boundaries": True
- name: DIETClassifier
epochs: 55
This is my nlu intent declaration:
- regex: regex_phone_number
examples: |
- \(?[1-9]{2}\)? ?(?:[2-8]|[9]{0,1}[5-9]{1})[0-9]{3}\-?[0-9]{4}
- intent: phone_number
examples: |
- meu número é [8973542665](regex_phone_number)
- [61992852776](regex_phone_number)
- intent: invalid_phone_number
examples: |
- 123
- 00000
- 111111
- asdhja
- aaaaaa
- telefone123
- 111111111111111111
- ldhuahsduashd
What I’m trying to do is to extract phone numbers according to a regex pattern, which is defined in nlu intents with examples. If a number have this pattern, it should follow a path. Otherwise, it should receive a “invalid_phone_number” intent. But when I train and run my project, numbers out of this pattern are extracted for both extractors:
rasa.core.processor - Received user message '**00000**' with intent '{'id': 5831918261946756680, 'name': 'invalid_phone_number', 'confidence': 0.27582165598869324}' and entities '[{'entity': 'regex_phone_number', 'start': 0, 'end': 5, 'confidence_entity': 0.45642849802970886, 'value': '00000', 'extractor': 'DIETClassifier'}]'
How can I do to extract only numbers that follow this pattern, so it won’t accept numbers like “0000”? I already tested this line of regex and it looks fine.