Regex pattern for mobile number not working properly

rasa-nlu

(Nikhil Bansal) #1

I have added this regex pattern in my training data file

^[789][0-9]{9}$

This regex picks up possible outcomes like indian mobile numbers but the issue is it is not limiting the digit to 10 i.e if i enter like 12345 or even 123 it is taking it into slots also i have tested this regex here

https://regex101.com/

pattern is working fine but i am facing issue with my bot


(Akela Drissner) #2

Can you post the regex section of your training data file?


(Nikhil Bansal) #3
"regex_features": [
  {
    "name": "email",
    "pattern": "^([a-zA-Z0-9_\\-\\.]+)@([a-zA-Z0-9_\\-\\.]+)\\.([a-zA-Z]{2,5})$"
  },
  {
    "name": "IncidentID",
    "pattern": "(?i)(SR)[0-9]*"
  },
  {
    "name": "empCode",
    "pattern": "\\b\\d{8}\\b"
  },
  {
    "name": "otp",
    "pattern": "\\b\\d{6}\\b"
  },
  {
    "name": "MobNumberInt",
    "pattern": "[(+]*[0-9]{10}[()+. -]*"
  }
],

(Nikhil Bansal) #4

i am facing issue with mobile number regex pattern their are two mentioned both are facing this issue


(Akela Drissner) #5

I’m confused, I thought you said there was a problem with the number regex?


(Nikhil Bansal) #6

i have mentioned mobile numbers above, i want my bot to accept mobile numbers not random numbers like 123456 or anything of that sort! right now bot is accepting any numeric value into the slot


(Akela Drissner) #7

Can you also post what your training data looks like? Do you have any other shorter numbers that are present in it as well (doesn’t have to be the same entity)?


(Nikhil Bansal) #8

no in my training data for fetching mobile number i dont have less that 10 digit input.


(Nikhil Bansal) #9

training data - https://gist.github.com/NikhilBansal21/98028418102128453593a30885d38e1f


(Akela Drissner) #10

Ok so the thing is it doesn’t just pay attention to the actual word, but also the context. This is why it might accept shorter numbers too. Also you do have a short number as a mobile number entity towards the bottom of your training data. I saw that you have a lot of different numbers that you extract as different entities, my suggestion would be just to extract these with duckling and then store them in slots with a custom action. That way you can validate them to see if they’re in the correct format too


(Nikhil Bansal) #11

yes i added less digits because every time i enter like -

U- my employee id is 9999852419 (a mobile number string)

My bot picks up mobile number intent in this case which i don’t want my bot to do. even if user enters a mobile number in case of employee id or otp i want my bot to process it and guide user that input is wrong. Yes you are right i want it to handle at that context itself !


(Nikhil Bansal) #12

i have implemented duckling

pipeline:

  • name: “tokenizer_whitespace”
  • name: “intent_entity_featurizer_regex”
  • name: “ner_duckling_http”
  • name: “ner_synonyms”
  • name: “intent_featurizer_count_vectors”
  • name: “intent_classifier_tensorflow_embedding” intent_tokenization_flag: true intent_split_symbol: “+”

after using this bot is not extracting entities at all


(Souvik Ghosh) #13

you need to provide the server for duckling


(Nikhil Bansal) #14

what server ?? the port where my core runs?


(Souvik Ghosh) #15

duckling server


(Nikhil Bansal) #16

here it says that

To use this component you need to run a duckling server. The easiest option is to spin up a docker container using docker run -p 8000:8000 rasa/duckling .

How can i run it without docker?


(Souvik Ghosh) #17

the installation instructions are here


(Nikhil Bansal) #18

thanks :slight_smile: