NER_CRF lower case

Hey, I found out that NER_CRF extracts the entity as they were typed like in upper or lower case. In earlier version the entity was cast to lower case. How can manipulate this. I want all to lower case. v13.1

facing same issue but dont find the forum very active on answers … Currently did a workaround via putting examples or u can also tinker with ner crf configurations for features .

Hm I don’t think we changed anything on that. Can you post your config file and also the output of the extracted entities?


pipeline:
- name: "intent_featurizer_count_vectors"
- name: "intent_classifier_tensorflow_embedding"
- name: "tokenizer_whitespace"
- name: "ner_crf"

{'intent': {'name': 'Cost', 'confidence': 0.9375881552696228}, 'entities': [{'start': 11, 'end': 28, 'value': 'Product', 'entity': 'Leistung', 'confidence': 0.9962287279461889, 'extractor': 'ner_crf'}], 'intent_ranking': [], 'text': 'costs for Product'}

I changed the output to show the meanning. Entity is in the format like used in inout text

You’re using the default ner_crf features which are:

“features”: [ [“low”, “title”, “upper”], [“bias”, “low”, “prefix5”, “prefix2”, “suffix5”, “suffix3”, “suffix2”, “upper”, “title”, “digit”, “pattern”], [“low”, “title”, “upper”]],

‘title’ should return true if a word starts with an upper case, you may want to remove this feature (although i would suggest against it as entities tend to start with an upper case, but this will depend on your data/application) Another thing i would suggest to try is to provide synonyms in lower case or add data with entities mentioned in lower case.

I think this has nothing to do with my issue. Those are just the features to extract NERs. It goes about the processed outputted entity…

Just reread your question, so all you want is for the value of the entity to be casted to lower case? so from your example, you want to the output to be ‘value’: ‘product’ and not ‘value’:‘Product’? :confused:

@tmbo did we change anything there?

No, I don’t think this was the case - the entities casing should have always been like it was sent to the nlu.

Ok, thanks.

Hello, you got it? i have entities in my training date with uppercase but when i parse with lowercase they are not recognized

{ “intent”: “B_Test-Entity - Food&Parser_Test-Flow-3&V24.0”,

    "entities": [

      {
        "start": 0,
        "end": 5,
        "value": "Restaurante",
        "entity": "B_food"
      }
    ],

    "text": "Sushi"
  },

My pipeline-> language: “pt”

pipeline:

  • name: “tokenizer_whitespace”
  • name: “intent_featurizer_count_vectors”“lowercase”: true“OOV_token”: None
  • name: “ner_crf”features: [ [“low”,“upper”], [“bias”, “low”, “prefix5”, “prefix2”, “suffix5”, “suffix3”, “suffix2”, “upper”, “digit”, “pattern”], [“low”,“upper”] ]
  • name: “intent_classifier_tensorflow_embedding”“random_seed”: 1
  • name: “ner_duckling_http”url: “http://localhost:8000” locale: “pt_PT” timezone: “Europe/London” dimensions: [“amount-of-money”,“distance”,“duration”,“email”,“phone-number”,“quantity”,“temperature”,“time”,“url”,“volume”]

for exemple:

(works fine) I want Sushi restaurants

(not working) I want sushi restaurants

I want recognize the entity in both cases

Olá, @sfurao! Como está?

As Sam suggested before, it might help to add more examples of the lowercase entity in your training data or as synonyms. Something that worked in our bot was similar to this:

  • [Portugal] (LOC)
  • [portugal] (LOC)

or

synonym:Portugal

  • portugal

Espero que ajude!

Hey,

I dont know if this is the right place for this question, but I dont understand the configuration of the ner_crf as there is no further description than listing the possibilities. the ner_crf website not very conclusive as well. My problem (sorry for opening a new subject my previous try didn’t get any answers) is, that there is always only one entity extracted even though all my examples contain two. I changed all my training data to lower case and immediately cast the user input to lower case as well. Is that a good idea or is it pretty stupid as the capitals improve recognition (not all entities start with capitals, also funny enough only the lower case ones are being recognized)

Thank you :slight_smile:

Whilst this could be very specific to your situation, you could write a custom pipeline component that goes through and converts all extracted values to lowercase/uppercase/titlecase as needed.

There’s a recent blog article about custom NLU pipeline components here and I’ve just gist’d an example from my own bot that adds title casing to whatever entities you want.

1 Like