Ner_crf

azizullah2017 · August 30, 2018, 2:30pm

Hey, I am facing a little ner_crf, it detects new as entity in both sentences like, I am for a new website, I am not looking at new website, it must recognized new in second sentence .

here it my pipeline:

name: “nlp_spacy” model: “en”
name: “tokenizer_spacy”
name: “ner_crf” BILOU_flag: true features:

features for word before token

[“low”, “title”, “upper”, “pos”, “pos2”]

features of token itself

[“bias”, “low”, “upper”, “title”, “digit”, “pos”, “pos2”, “pattern”]

features for word after the token we want to tag

[“low”, “title”, “upper”, “pos”, “pos2”] max_iterations: 50 L1_c: 1 L2_c: 1e-3
name: “ner_synonyms”
name: “intent_featurizer_count_vectors”
name: “intent_classifier_tensorflow_embedding”
image721×381 53.1 KB

and I face when I add “word2” , “word3” in crf feature in the pipline I face this error

akelad · September 3, 2018, 8:03am

We made some changes to the features in ner_crf, this is the list of available ones now:

github.com

RasaHQ/rasa_nlu/blob/master/rasa_nlu/extractors/crf_entity_extractor.py#L67


    # The maximum number of iterations for optimization algorithms.
    "max_iterations": 50,


    # weight of theL1 regularization
    "L1_c": 0.1,


    # weight of the L2 regularization
    "L2_c": 0.1
}


function_dict = {
    'low': lambda doc: doc[0].lower(),
    'title': lambda doc: doc[0].istitle(),
    'prefix5': lambda doc: doc[0][:5],
    'prefix2': lambda doc: doc[0][:2],
    'suffix5': lambda doc: doc[0][-5:],
    'suffix3': lambda doc: doc[0][-3:],
    'suffix2': lambda doc: doc[0][-2:],
    'suffix1': lambda doc: doc[0][-1:],
    'pos': lambda doc: doc[1],
    'pos2': lambda doc: doc[1][:2],

The standard ones used are listed here: http://rasa.com/docs/nlu/components/#ner-crf

azizullah2017 · September 3, 2018, 8:47am

I have used the latest version of rasa_nlu and I have also updated the dictionary but all in vain, I am facing the same issue.

akelad · September 4, 2018, 9:30am

Updated the dictionary where? Can you post your config file in a legible format please?

azizullah2017 · September 4, 2018, 9:38am

language: "en"

pipeline:
- name: "nlp_spacy"
  model: "en"
- name: "tokenizer_spacy"
- name: "ner_spacy"
- name: "ner_duckling"
- name: "ner_duckling_http"
  dimensions:
  - "NUMBER"
- name: "ner_crf"
  BILOU_flag: true
  features:
    # features for word before token
    - ["low", "title", "upper", "pos", "pos2"]
    # features of token itself
    - ["bias", "low", "upper","word3", "word2" "title", "digit", "pos", "pos2", "pattern"]
    # features for word after the token we want to tag
    - ["low", "title", "upper", "pos", "pos2"]
  max_iterations: 50
  L1_c: 1
  L2_c: 1e-3
- name: "ner_synonyms"
- name: "intent_featurizer_count_vectors"
- name: "intent_classifier_tensorflow_embedding"

akelad · September 5, 2018, 2:38pm

Yes so you need to remove all the word3, word2 etc features that aren’t part of the available features for the new version anymore. These are the standard ones, that I linked previously http://rasa.com/docs/nlu/components/#ner-crf

azizullah2017 · September 6, 2018, 12:27pm

Can you tell why it is not update in git ?

github.com

RasaHQ/rasa_nlu/blob/master/sample_configs/config_crf.yml

language: "en"

pipeline:
- name: "nlp_spacy"
  model: "en"
- name: "tokenizer_spacy"
- name: "ner_spacy"
- name: "ner_duckling_http"
  url: "http://duckling:8000"
  dimensions:
  - "NUMBER"
- name: "ner_crf"
  BILOU_flag: true
  features:
    # features for word before token
    - ["low", "title", "upper", "pos", "pos2"]
    # features of token itself
    - ["bias", "low", "word3", "word2", "upper", "title", "digit", "pos", "pos2", "pattern"]
    # features for word after the token we want to tag
    - ["low", "title", "upper", "pos", "pos2"]

This file has been truncated. show original

akelad · September 6, 2018, 12:37pm

looks like we forgot to update that, feel free to create a PR to fix it

azizullah2017 · September 6, 2018, 12:44pm

Ok thanks

azizullah2017 · September 6, 2018, 12:46pm

Can you tell where I can where I can find help, I would like to know what this features means, like “pos”, “title” etc. as mentioned in spacy config http://rasa.com/docs/nlu/components/#ner-crf

azizullah2017 · September 6, 2018, 1:04pm

@akelad can you help where to find the features meaning ?

souvikg10 · September 6, 2018, 2:21pm

https://eli5.readthedocs.io/en/latest/tutorials/sklearn_crfsuite.html#feature-extraction

You have information about the features here

kapilkathuria · September 28, 2018, 4:56am

More information on crf features:

https://hk.saowen.com/a/c8fe0764b2e5d63ca38cc9867746b739c94eb2bec0b2f6b86d24fa2b01023a11

Topic		Replies	Views
Rasa_NLU ner_crf classification issue Rasa Open Source	1	501	June 12, 2019
Using NER as a Feature for CRFEntityExtractor Rasa Open Source	6	1700	June 28, 2021
Feeding Custom/Pretrained embeddings for ner_crf Rasa Open Source	9	3263	May 22, 2020
Suggestion for pipeline Rasa Open Source	1	557	April 9, 2019
Leveraging both spaCy and CRF entity extraction correctly Rasa Open Source	8	4945	February 18, 2020

Ner_crf

features for word before token

features of token itself

features for word after the token we want to tag

Related topics