RegeEntityExtractor doubles the entity

andreacirillo · October 28, 2020, 8:28am

Hi everyone!

I am having troubles with RegexEntityExtractor component. I am using this component in order to extract italian telephone numbers. It actually manage to extract them, but I don’t understand why, it does it twice, doubling the entity.

This is my pipeline

This is my training data

(the last two examples called “(telefono)” refers to the entity I am talking about)

This is what I can see using the command “rasa shell --debug”

Here’s what I have tried:

1- Removing the flag “use_regex: true” from RegexEntityExtractor 2. Removing the training examples referring to the regex in my NLU file

By the way: italian telephone numbers are made of 9 - 10 numbers (prefixes +39 or 0039 can be added). This is the regex I am using:

regex: telefono examples: |
- \b(+39|0039)?\d{9,10}\b

Same problem with (+39|0039)?\d{9,10}

Thanks a lot, Andrea

Tanja · October 29, 2020, 1:46pm

I was not able to reproduce the problem. The regex you provided did not work for me, I used the following regex

- regex: telefono
  examples: |
    - [+39|0039]?\d{9,10}

Using that I just got only one entity extracted by the RegexEntityExtractor when processing a string like 0123456789. Can you try with the regex above? Does the problem still persist?

andreacirillo · November 10, 2020, 12:28pm

Hi Tanja!

Thanks for your suggestion… we actually found out that Duckling component is better to extract the telephone numbers, so we decided to use it.

Andrea

Topic		Replies	Views
Entities can't get extracted with regex Rasa Open Source	18	1225	January 18, 2022
RegexEntityExtractor Rasa Open Source	6	965	September 17, 2021
No Regex Entity Extraction Getting Started with Rasa	2	214	February 16, 2021
RegexEntityExtrator and DIETClassifier extracting same intent and not following regex rule? Rasa Open Source	5	1704	June 21, 2021
Similar Entity Extraction Rasa Open Source	18	2451	October 26, 2018

RegeEntityExtractor doubles the entity

Related topics