(Aayush Sahanan)
September 1, 2020, 4:11pm
I am using Regex for emails, but it is not working.
The snippet of the regex is -
email is the entity i want to use regex for, e.g,
[abc@yahoo.com] (email)
[xyz@abc.co.in] (email)
The config file is :
language: “en”
name: WhitespaceTokenizer
name: RegexFeaturizer
name: LexicalSyntacticFeaturizer
name: CountVectorsFeaturizer
name: CountVectorsFeaturizer
analyzer: char_wb
min_ngram: 1
max_ngram: 4
name: DucklingHTTPExtractor
url: http://localhost:8000
timezone: Asia/Kolkata
timeout: 3
name: DIETClassifier
epochs: 100
name: EntitySynonymMapper
name: AugmentedMemoizationPolicy
max_history: 6
name: TEDPolicy
max_history: 5
epochs: 100
name: MappingPolicy
name: FallbackPolicy
nlu_threshold: 0.3
core_threshold: 0.2
ambiguity_threshold: 0
fallback_action_name: utter_default_fallback
name: FormPolicy
Even the lookup tables don’t seem to work.
Please help me with the issue.
Use this Regex for email address validations.
{|}~-]+) |"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\[\x01-\x09\x0b\x0c\x0e-\x7f])")@(?:(?:a-z0-9 ?.)+a-z0-9 ?|[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-] [a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\[\x01-\x09\x0b\x0c\x0e-\x7f])+)])
If you are using duckling then it’ll extract email addresses for you.
Simply add the entity: email to your domain and the dimensions in config file
(Aayush Sahanan)
September 2, 2020, 11:21am
If I use duckling, the name slot also gets overridden with the email slot maybe because of the DIET classifier as well. Is there a way to do it with regex?
Add this to your nlu file.
## regex:email_entity_name
- (?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])
(Aayush Sahanan)
September 15, 2020, 9:31am
It is still not working, it accepts things like “google.com ” which is not an email id
Please see the regex I’ve tested. Might have made some mistake while copying.
