I needed to add the RegexEntityExtractor to my pipeline to make use of lookup lists. That part seemed to work fine which is great… but there have been some strange regressions in other areas. For example cellphone numbers +61414487283 are being extracted correctly +61414487283 but then also the entire number is being broken down into its digits

6 1 4 1 4

… and so on until the end of the number

Here’s my pipeline:


  • name: WhitespaceTokenizer
  • name: RegexFeaturizer
  • name: RegexEntityExtractor case_sensitive: False use_lookup_tables: True use_regexes: True
  • name: LexicalSyntacticFeaturizer
  • name: CountVectorsFeaturizer
  • name: CountVectorsFeaturizer analyzer: char_wb min_ngram: 1 max_ngram: 4
  • name: DIETClassifier epochs: 100
  • name: EntitySynonymMapper
  • name: ResponseSelector epochs: 100 retrieval_intent: faq
  • name: ResponseSelector epochs: 100 retrieval_intent: chitchat
  • name: ResponseSelector epochs: 100 retrieval_intent: out_of_scope
  • name: FallbackClassifier threshold: 0.2 ambiguity_threshold: 0.001

Sorry, the first post didnt really show the markup.

it’s breaking the cell into



@john.christian Please format the above code whilst using ‘’’ ‘’’

@john.christian Means you want the output should come like 61414… as intact not break up? Yes or No?

It’s basically detecting two entities from one string.

It’s detecting “+61414487222” as a cell number slot. that’s correct. But its also detecting each digit as another slot that is called “buyer_min_num_bedrooms” which is generally a 1 digit number. and its doing this for every number in the cell phone number.

I’ve just noticed I have this warning too

UserWarning: Parsing of message: ‘+61414487222 start new openhome session with 413 for the property 52’ lead to overlapping entities: +61414487222 of type user_mobile_number extracted by DIETClassifier overlaps with 6 of type buyer_min_num_bedrooms extracted by RegexEntityExtractor. This can lead to unintended filling of slots. Please refer to the documentation section on entity extractors and entities getting extracted multiple times:Components

@john.christian I guess this is because of RegexEntityExtractor and DIET Classifier as both extract the entity.

@john.christian Are you using Regex for lookup table only or did you even give some examples?

Check this:


This component extract entities using the lookup tables and regexes defined in the training data. The component checks if the user message contains an entry of one of the lookup tables or matches one of the regexes. If a match is found, the value is extracted as entity.

This component only uses those regex features that have a name equal to one of the entities defined in the training data. Make sure to annotate at least one example per entity. Ref: Components


may be create custom nlu component that will filter extracted entities


You can use Duckling component to extract the telephone numbers?


Try give more training examples and even delete the older models and train again.

yes… so I am wondering how I can possibly use a lookup list? I only need the lookup list for one slot - but DIETClassifiers can’t look at lookup lists correct? So that’s why I added Regex Entity - how do others handle lists? I can’t remove the DIETClassifier…