Case Sensitivity

JoeTorino · November 9, 2018, 10:20am

I’m testing my chatbot, I inserted several entity values which are all in uppercase. I would like my NLU model to ignore the case sensitivity but despite removing the t"itle" and “upper” components it does not identify them. Also in order to conduct any form of entity extraction my crf component needs a “upper” component due to them all being upper case values.

How do I remove case sensitivity from my CRF component?

Matotias · January 14, 2019, 5:01pm

Did you manage to fix this? I have the same problem

Ghostvv · January 14, 2019, 5:21pm

which pipeline do you use?

Matotias · January 14, 2019, 5:37pm

This is the pipeline I use

language: "es"

pipeline:
- name: "tokenizer_whitespace"
- name: "ner_crf"
  features: [
              ["low", "upper"],
              ["bias", "low", "prefix5", "prefix2", "suffix5", "suffix3",
               "suffix2", "upper", "digit", "pattern"],
              ["low", "upper"]
            ]
- name: "ner_synonyms"
- name: "intent_featurizer_count_vectors"
- name: "intent_classifier_tensorflow_embedding"
# nn architecture
  "num_hidden_layers_a": 2
  "hidden_layer_size_a": [256, 128]
  "num_hidden_layers_b": 0
  "hidden_layer_size_b": []
  "batch_size": 32
  "epochs": 50
  # embedding parameters
  "embed_dim": 20
  "mu_pos": 0.8  # should be 0.0 < ... < 1.0 for 'cosine'
  "mu_neg": -0.4  # should be -1.0 < ... < 1.0 for 'cosine'
  "similarity_type": "cosine"  # string 'cosine' or 'inner'
  "num_neg": 10
  "use_max_sim_neg": true  # flag which loss function to use
  # regularization
  "C2": 0.002
  "C_emb": 0.8
  "droprate": 0.2
  # flag if to tokenize intents
  "intent_tokenization_flag": false
  "intent_split_symbol": "_"

Ghostvv · January 14, 2019, 5:41pm

you would need to add an option there like case_sensitive and if it is False then use text.lower()

Do you mind creating a PR

ccelotto · June 17, 2019, 7:32am

I am having a similar issue. My goal is for our bot to be case insensitive. I am using a very similar pipeline setup to @Matotias, and I also added case_sensitive: false to tokenizer_whitespace (now called WhitespaceTokenizer in newer versions of Rasa). Our model still seems to be case sensitive after these changes. An example is, I type the word “bar” and our entity extractor identifies it properly as a place_type. When I type “BAR”, the entity extractor does not recognize the word.

Any thoughts? Am I missing something?

Ghostvv · June 17, 2019, 10:52am

there are various config options for crf: Components

ccelotto · June 17, 2019, 8:40pm

Hi @Ghostvv. Thanks! I did some research on those and followed the advice from other posts about removing “title” like OP. This is what my pipeline looks like. Based on what I’ve read here, my model should be case insensitive, but it is not.

language: "en"

pipeline: #"supervised_embeddings"
  
  - name: "WhitespaceTokenizer"
    case_sensitive: false
  - name: "RegexFeaturizer"
  - name: "CRFEntityExtractor"
    features: [
              ["low", "upper"],
              ["bias", "low", "prefix5", "prefix2", "suffix5", "suffix3",
               "suffix2", "upper", "digit", "pattern"],
              ["low", "upper"]
            ]
  - name: "EntitySynonymMapper"
  - name: "CountVectorsFeaturizer"
    oov_token: oov
  - name: "EmbeddingIntentClassifier"
    epochs: 50
    intent_tokenization_flag: true
    intent_split_symbol: "+"

Ghostvv · June 18, 2019, 12:10pm

The problem is that WhitespaceTokenizer doesn’t have case_sensitive option. Do you mind creating a PR to fix that?

sibbsnb · July 16, 2019, 4:28am

raised the PR.

Ghostvv · July 16, 2019, 7:46am

thanks, reviewed

sibbsnb · July 22, 2019, 7:24pm

This is closed, PR merged.

kiranbeethoju · February 6, 2020, 6:27am

great work @sibbsnb

aniket · March 13, 2020, 9:52am

hello,

is it working for you ?

Topic		Replies	Views
Ner_crf case Rasa Open Source	6	1501	February 27, 2019
Rasa nlu dialogue management case sensitive Rasa Open Source	3	595	December 14, 2020
NER_CRF lower case Rasa Open Source	13	3117	March 9, 2019
Upper and lower case Rasa Open Source	5	1801	October 15, 2019
Rasa case sensitivity Rasa Open Source	1	303	August 29, 2022

Case Sensitivity

Related topics