Case insensitive whitespace tokenizer

Hi, I found several forum posts advising people to use the following for implementing a case insensitive pipeline.

  - name: WhitespaceTokenizer 
    case_sentitive: False

but when I tried to use this, I get the following warning when training my model:

UserWarning: You have provided an invalid key `case_sentitive` for component `WhitespaceTokenizer` in your pipeline. Valid options for `WhitespaceTokenizer` are:
- intent_tokenization_flag
- token_pattern
- intent_split_symbol

Python version: 3.8.0

Rasa version: 2.2.0

my pipeline:

  - name: WhitespaceTokenizer 
    case_sentitive: False        
  - name: RegexFeaturizer      
    case_sensitive: False
  - name: LexicalSyntacticFeaturizer
  - name: CountVectorsFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: char_wb
    min_ngram: 1
    max_ngram: 4    
  - name: DIETClassifier
  #  entity_recognition: False
    epochs: 150 
  - name: CRFEntityExtractor
  - name: EntitySynonymMapper        
  # - name: ResponseSelector
  #   epochs: 100
  - name: FallbackClassifier
    threshold: 0.5

@Bitzmann Have you check in the documentation of Rasa 2.0, Components may be its depreciated? As I can’t see or can you share the forum post for my ref?. Still I will check the error,.

@Bitzmann It is used for Regex for sure:

- name: "RegexFeaturizer"
  # Text will be processed with case sensitive as default
  "case_sensitive": True
  # use match word boundaries for lookup table
  "use_word_boundaries": True

Yes, me sure now it’s only used for RegexFeaturizer :slight_smile: you scared me :stuck_out_tongue: that’s why you seeing the error as he is asking for:

- name: "WhitespaceTokenizer"
  # Flag to check whether to split intents
  "intent_tokenization_flag": False
  # Symbol on which intent should be split
  "intent_split_symbol": "_"
  # Regular expression to detect tokens
  "token_pattern": None

I hope its a solution for you :slight_smile:

@Bitzmann If this suggestion and solution solved your error, can I request please close this topic as a solution for others.