RegexEntityExtractor

john.christian · September 17, 2021, 9:14am

I needed to add the RegexEntityExtractor to my pipeline to make use of lookup lists. That part seemed to work fine which is great… but there have been some strange regressions in other areas. For example cellphone numbers +61414487283 are being extracted correctly +61414487283 but then also the entire number is being broken down into its digits

6 1 4 1 4

… and so on until the end of the number

Here’s my pipeline:

pipeline:

name: WhitespaceTokenizer
name: RegexFeaturizer
name: RegexEntityExtractor case_sensitive: False use_lookup_tables: True use_regexes: True
name: LexicalSyntacticFeaturizer
name: CountVectorsFeaturizer
name: CountVectorsFeaturizer analyzer: char_wb min_ngram: 1 max_ngram: 4
name: DIETClassifier epochs: 100
name: EntitySynonymMapper
name: ResponseSelector epochs: 100 retrieval_intent: faq
name: ResponseSelector epochs: 100 retrieval_intent: chitchat
name: ResponseSelector epochs: 100 retrieval_intent: out_of_scope
name: FallbackClassifier threshold: 0.2 ambiguity_threshold: 0.001

john.christian · September 17, 2021, 9:24am

Sorry, the first post didnt really show the markup.

it’s breaking the cell into

[6](buyer_min_num_bedrooms)
[1](buyer_min_num_bedrooms)
[4](buyer_min_num_bedrooms)
[1](buyer_min_num_bedrooms)
[4](buyer_min_num_bedrooms)

etc

nik202 · September 17, 2021, 9:28am

@john.christian Please format the above code whilst using ‘’’ ‘’’

@john.christian Means you want the output should come like 61414… as intact not break up? Yes or No?

john.christian · September 17, 2021, 9:33am

It’s basically detecting two entities from one string.

It’s detecting “+61414487222” as a cell number slot. that’s correct. But its also detecting each digit as another slot that is called “buyer_min_num_bedrooms” which is generally a 1 digit number. and its doing this for every number in the cell phone number.

john.christian · September 17, 2021, 9:34am

I’ve just noticed I have this warning too

UserWarning: Parsing of message: ‘+61414487222 start new openhome session with 413 for the property 52’ lead to overlapping entities: +61414487222 of type user_mobile_number extracted by DIETClassifier overlaps with 6 of type buyer_min_num_bedrooms extracted by RegexEntityExtractor. This can lead to unintended filling of slots. Please refer to the documentation section on entity extractors and entities getting extracted multiple times:Components

nik202 · September 17, 2021, 9:42am

@john.christian I guess this is because of RegexEntityExtractor and DIET Classifier as both extract the entity.

@john.christian Are you using Regex for lookup table only or did you even give some examples?

Check this:

Description

This component extract entities using the lookup tables and regexes defined in the training data. The component checks if the user message contains an entry of one of the lookup tables or matches one of the regexes. If a match is found, the value is extracted as entity.

This component only uses those regex features that have a name equal to one of the entities defined in the training data. Make sure to annotate at least one example per entity. Ref: Components

OR

may be create custom nlu component that will filter extracted entities

OR

You can use Duckling component to extract the telephone numbers?

OR

Try give more training examples and even delete the older models and train again.

john.christian · September 17, 2021, 9:44am

yes… so I am wondering how I can possibly use a lookup list? I only need the lookup list for one slot - but DIETClassifiers can’t look at lookup lists correct? So that’s why I added Regex Entity - how do others handle lists? I can’t remove the DIETClassifier…

Topic		Replies	Views
Can someone explain the RegexFeaturizer/RegexEntityExtractor for me? Rasa Open Source	6	909	December 16, 2022
RegexEntityExtraction doesn't work on rasa 3.6.15 Rasa Open Source	0	143	February 6, 2024
RegexEntityExtrator and DIETClassifier extracting same intent and not following regex rule? Rasa Open Source	5	1678	June 21, 2021
Entity extraction regexentityextractoe Rasa Open Source	5	345	December 1, 2020
Has anyone successfully implemented strict regex patterns for entity extraction? Rasa Open Source	1	251	July 3, 2023

RegexEntityExtractor

Related topics