I needed to add the RegexEntityExtractor to my pipeline to make use of lookup lists. That part seemed to work fine which is great… but there have been some strange regressions in other areas. For example cellphone numbers +61414487283 are being extracted correctly +61414487283 but then also the entire number is being broken down into its digits
It’s basically detecting two entities from one string.
It’s detecting “+61414487222” as a cell number slot. that’s correct.
But its also detecting each digit as another slot that is called “buyer_min_num_bedrooms” which is generally a 1 digit number. and its doing this for every number in the cell phone number.
UserWarning: Parsing of message: ‘+61414487222 start new openhome session with 413 for the property 52’ lead to overlapping entities: +61414487222 of type user_mobile_number extracted by DIETClassifier overlaps with 6 of type buyer_min_num_bedrooms extracted by RegexEntityExtractor. This can lead to unintended filling of slots. Please refer to the documentation section on entity extractors and entities getting extracted multiple times:Components
@john.christian I guess this is because of RegexEntityExtractor and DIET Classifier as both extract the entity.
@john.christian Are you using Regex for lookup table only or did you even give some examples?
Check this:
Description
This component extract entities using the lookup tables and regexes defined in the training data. The component checks if the user message contains an entry of one of the lookup tables or matches one of the regexes. If a match is found, the value is extracted as entity.
This component only uses those regex features that have a name equal to one of the entities defined in the training data. Make sure to annotate at least one example per entity. Ref:Components
OR
may be create custom nlu component that will filter extracted entities
OR
You can use Duckling component to extract the telephone numbers?
OR
Try give more training examples and even delete the older models and train again.
yes… so I am wondering how I can possibly use a lookup list? I only need the lookup list for one slot - but DIETClassifiers can’t look at lookup lists correct? So that’s why I added Regex Entity - how do others handle lists? I can’t remove the DIETClassifier…