Regex for emails

I am using Regex for emails, but it is not working. The snippet of the regex is -

regex:email

  • [a-zA-Z0-9+_.-]@[a-zA-Z0-9.-].[a-zA-Z]{2,5}

email is the entity i want to use regex for, e.g,

intent:email1

  • [abc@yahoo.com] (email)
  • [xyz@abc.co.in] (email)

The config file is :

language: “en” pipeline:

  • name: WhitespaceTokenizer

  • name: RegexFeaturizer

  • name: LexicalSyntacticFeaturizer

  • name: CountVectorsFeaturizer

  • name: CountVectorsFeaturizer analyzer: char_wb min_ngram: 1 max_ngram: 4

  • name: DucklingHTTPExtractor

    url: http://localhost:8000 dimensions:

    • time

    timezone: Asia/Kolkata timeout: 3

  • name: DIETClassifier epochs: 100

  • name: EntitySynonymMapper policies:

  • name: AugmentedMemoizationPolicy max_history: 6

  • name: TEDPolicy max_history: 5 epochs: 100

  • name: MappingPolicy

  • name: FallbackPolicy nlu_threshold: 0.3 core_threshold: 0.2 ambiguity_threshold: 0 fallback_action_name: utter_default_fallback

  • name: FormPolicy

Even the lookup tables don’t seem to work.

Please help me with the issue.

Use this Regex for email address validations.

(?:[a-z0-9!#$%&'+/=?^_{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_{|}~-]+)|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\[\x01-\x09\x0b\x0c\x0e-\x7f])")@(?:(?:a-z0-9?.)+a-z0-9?|[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-][a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\[\x01-\x09\x0b\x0c\x0e-\x7f])+)])

If you are using duckling then it’ll extract email addresses for you.

Simply add the entity: email to your domain and the dimensions in config file

If I use duckling, the name slot also gets overridden with the email slot maybe because of the DIET classifier as well. Is there a way to do it with regex?

Add this to your nlu file.

## regex:email_entity_name
- (?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])

It is still not working, it accepts things like “google.com” which is not an email id

Please see the regex I’ve tested. Might have made some mistake while copying.

1 Like