Regex: Unable to extract correct entity according to Regex

In my application, I have 2 type of values:

  1. User Account Value (eg. a6hhy78hy28f)
  2. User Activitty Log Value (eg. a6982jdha8sja98sj29d8j)

Noticed that I can identify them using Regex:

  1. /^a6[a-fA-F0-9]{10}$/
  2. /^a6[a-fA-F0-9]{20}$/

My current config.yml:

- name: WhitespaceTokenizer
- name: CRFEntityExtractor
  number_additional_patterns: 10  
- name: RegexEntityExtractor
  use_lookup_tables: True
 - name: LexicalSyntacticFeaturizer
 - name: CountVectorsFeaturizer
 - name: CountVectorsFeaturizer
    analyzer: char_wb
    min_ngram: 1
    max_ngram: 4
 - name: DIETClassifier
   epochs: 100
   constrain_similarities: true
 - name: EntitySynonymMapper
 - name: ResponseSelector
   epochs: 100
   constrain_similarities: true

In rasa interactive

Issue #1: Occasionally, When the input is <activity_log> or <provide_user_account> only : The confidence of intent is low

  0.44 provide_user_account
  0.41 provide_activity_log

Issue #2: When the input is <activity_log> or <provide_user_account> only, the bot is not clasifying the intent correctly even when the pipeline knows it is user_account but not activity_log

Is the intent 'provide_activity_log' correct for '[a6982jdha8sja98sj29d8j](user_account)' and
are all entities labeled correctly?

Its very unlikely user will type in full sentence when giving any value.

With such low confidence it is very likely to get the wrong entity. Sometimes overlapped extraction will happen because of RegexEntityExtractor and DIETClassifier.

Is there anyway i can force the bot to fix the next intent or improve the ML model to improve the intent classification and entity extraction?

Please kindly share if you know a solution to this problem. Thanks in advance.

1 Like

Are you using the DIET entity extractors & the regex? Are the entities labelled in the training data? If so that can cause issues & the quicked fix would be to remove the entity labels in the training data. (Hard to say w/out seeing the training data though.)

Hi Rachael, thanks for your reply.

I tried removing the entities labelled in training data. Yet still detecting wrong intent. The training data is very similar because that is the easiest answer we expect from user. Let me share with you :slight_smile:


- intent: provide_user_account
  examples: |
      - My account id is a6ja9daa98u9
      - a6ja9d82jd9a82jjaa98u9
      - here is my account id is a6ja9daa98u9
      - a6ja9daa98u9
      - here is my account id a6ja9daa98u9
      - my account id is a6ja9daa98u9
- intent: provide_activity_log
  examples: | 
    - My activity log is a6uy7djal34jha7fhaytrn
    - a6uy7djal34jha7fhaytrn is my activity log
    - here is my activity log a6uy7djal34jha7fhaytrn
    - a6uy7djal34jha7fhaytrn
    - My logid is a6uy7djal34jha7fhaytrn
    - here is my LogID a6uy7djal34jha7fhaytrn

Another problem is if i remove the entity label, it will not get any value for my custom action server

Ah, gotcha! I would combine those intents to something like provide_information. (This video has more information about why: Conversational AI with Rasa: Training Data and Rules - YouTube)

You can still provide an entity label w/ the regex. Are you maybe using the featurizer instead? I don’t see your regex entity in this training data: NLU Training Data

(As a note: I’m about to go OOO for a couple weeks so if I don’t reply that’s why. :))

Combining both into single nlu, but the Regex still doesnt seems to be differentiating between two types of data. But thats okay for now because I added validation on my custom action.