RegeEntityExtractor doubles the entity

Hi everyone!

I am having troubles with RegexEntityExtractor component. I am using this component in order to extract italian telephone numbers. It actually manage to extract them, but I don’t understand why, it does it twice, doubling the entity.

This is my pipeline image

This is my training data

image (the last two examples called “(telefono)” refers to the entity I am talking about)

This is what I can see using the command “rasa shell --debug”

Here’s what I have tried:

1- Removing the flag “use_regex: true” from RegexEntityExtractor 2. Removing the training examples referring to the regex in my NLU file

By the way: italian telephone numbers are made of 9 - 10 numbers (prefixes +39 or 0039 can be added). This is the regex I am using:

  • regex: telefono examples: |
    • \b(+39|0039)?\d{9,10}\b

Same problem with (+39|0039)?\d{9,10}

Thanks a lot, Andrea

I was not able to reproduce the problem. The regex you provided did not work for me, I used the following regex

- regex: telefono
  examples: |
    - [+39|0039]?\d{9,10}

Using that I just got only one entity extracted by the RegexEntityExtractor when processing a string like 0123456789. Can you try with the regex above? Does the problem still persist?

Hi Tanja!

Thanks for your suggestion… we actually found out that Duckling component is better to extract the telephone numbers, so we decided to use it.

Andrea