Case insensitive whitespace tokenizer

Bitzmann · July 13, 2021, 9:06pm

Hi, I found several forum posts advising people to use the following for implementing a case insensitive pipeline.

  - name: WhitespaceTokenizer 
    case_sentitive: False

but when I tried to use this, I get the following warning when training my model:

UserWarning: You have provided an invalid key `case_sentitive` for component `WhitespaceTokenizer` in your pipeline. Valid options for `WhitespaceTokenizer` are:
- intent_tokenization_flag
- token_pattern
- intent_split_symbol

Python version: 3.8.0

Rasa version: 2.2.0

my pipeline:

  - name: WhitespaceTokenizer 
    case_sentitive: False        
  - name: RegexFeaturizer      
    case_sensitive: False
  - name: LexicalSyntacticFeaturizer
  - name: CountVectorsFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: char_wb
    min_ngram: 1
    max_ngram: 4    
  - name: DIETClassifier
  #  entity_recognition: False
    epochs: 150 
  - name: CRFEntityExtractor
  - name: EntitySynonymMapper        
  # - name: ResponseSelector
  #   epochs: 100
  - name: FallbackClassifier
    threshold: 0.5

nik202 · July 13, 2021, 9:38pm

@Bitzmann Have you check in the documentation of Rasa 2.0, Components may be its depreciated? As I can’t see or can you share the forum post for my ref?. Still I will check the error,.

@Bitzmann It is used for Regex for sure:

pipeline:
- name: "RegexFeaturizer"
  # Text will be processed with case sensitive as default
  "case_sensitive": True
  # use match word boundaries for lookup table
  "use_word_boundaries": True

Yes, me sure now it’s only used for RegexFeaturizer you scared me that’s why you seeing the error as he is asking for:

pipeline:
- name: "WhitespaceTokenizer"
  # Flag to check whether to split intents
  "intent_tokenization_flag": False
  # Symbol on which intent should be split
  "intent_split_symbol": "_"
  # Regular expression to detect tokens
  "token_pattern": None

I hope its a solution for you

@Bitzmann If this suggestion and solution solved your error, can I request please close this topic as a solution for others.

Topic		Replies	Views
Case insensitivy Rasa Open Source	1	524	December 11, 2019
Rasa case sensitivity Rasa Open Source	1	303	August 29, 2022
Ras form - how to make the bot case insensitive Rasa Open Source	1	900	March 30, 2020
Warning from Rasa Utils, Error from RegexEntity Extractor and Rule Policy Tutorials, Resources & Videos	0	725	October 15, 2020
User query to lowercase in default pipeline Rasa Open Source	2	485	June 9, 2020

Case insensitive whitespace tokenizer

Related topics