Case Sensitivity

I’m testing my chatbot, I inserted several entity values which are all in uppercase. I would like my NLU model to ignore the case sensitivity but despite removing the t"itle" and “upper” components it does not identify them. Also in order to conduct any form of entity extraction my crf component needs a “upper” component due to them all being upper case values.

How do I remove case sensitivity from my CRF component?

1 Like

Did you manage to fix this? I have the same problem

1 Like

which pipeline do you use?

This is the pipeline I use

language: "es"

pipeline:
- name: "tokenizer_whitespace"
- name: "ner_crf"
  features: [
              ["low", "upper"],
              ["bias", "low", "prefix5", "prefix2", "suffix5", "suffix3",
               "suffix2", "upper", "digit", "pattern"],
              ["low", "upper"]
            ]
- name: "ner_synonyms"
- name: "intent_featurizer_count_vectors"
- name: "intent_classifier_tensorflow_embedding"
# nn architecture
  "num_hidden_layers_a": 2
  "hidden_layer_size_a": [256, 128]
  "num_hidden_layers_b": 0
  "hidden_layer_size_b": []
  "batch_size": 32
  "epochs": 50
  # embedding parameters
  "embed_dim": 20
  "mu_pos": 0.8  # should be 0.0 < ... < 1.0 for 'cosine'
  "mu_neg": -0.4  # should be -1.0 < ... < 1.0 for 'cosine'
  "similarity_type": "cosine"  # string 'cosine' or 'inner'
  "num_neg": 10
  "use_max_sim_neg": true  # flag which loss function to use
  # regularization
  "C2": 0.002
  "C_emb": 0.8
  "droprate": 0.2
  # flag if to tokenize intents
  "intent_tokenization_flag": false
  "intent_split_symbol": "_"

you would need to add an option there like case_sensitive and if it is False then use text.lower()

Do you mind creating a PR

1 Like

I am having a similar issue. My goal is for our bot to be case insensitive. I am using a very similar pipeline setup to @Matotias, and I also added case_sensitive: false to tokenizer_whitespace (now called WhitespaceTokenizer in newer versions of Rasa). Our model still seems to be case sensitive after these changes. An example is, I type the word “bar” and our entity extractor identifies it properly as a place_type. When I type “BAR”, the entity extractor does not recognize the word.

Any thoughts? Am I missing something?

there are various config options for crf: Components

Hi @Ghostvv. Thanks! I did some research on those and followed the advice from other posts about removing “title” like OP. This is what my pipeline looks like. Based on what I’ve read here, my model should be case insensitive, but it is not.

language: "en"

pipeline: #"supervised_embeddings"
  
  - name: "WhitespaceTokenizer"
    case_sensitive: false
  - name: "RegexFeaturizer"
  - name: "CRFEntityExtractor"
    features: [
              ["low", "upper"],
              ["bias", "low", "prefix5", "prefix2", "suffix5", "suffix3",
               "suffix2", "upper", "digit", "pattern"],
              ["low", "upper"]
            ]
  - name: "EntitySynonymMapper"
  - name: "CountVectorsFeaturizer"
    oov_token: oov
  - name: "EmbeddingIntentClassifier"
    epochs: 50
    intent_tokenization_flag: true
    intent_split_symbol: "+"
2 Likes

The problem is that WhitespaceTokenizer doesn’t have case_sensitive option. Do you mind creating a PR to fix that?

1 Like

raised the PR.

1 Like

thanks, reviewed

This is closed, PR merged.

1 Like

great work @sibbsnb

hello,

is it working for you ?