Hey, I am facing a little ner_crf, it detects new as entity in both sentences like, I am for a new website, I am not looking at new website, it must recognized new in second sentence .
here it my
pipeline:
name: “nlp_spacy”
model: “en”
name: “tokenizer_spacy”
name: “ner_crf”
BILOU_flag: true
features:
features for word before token
[“low”, “title”, “upper”, “pos”, “pos2”]
features of token itself
[“bias”, “low”, “upper”, “title”, “digit”, “pos”, “pos2”, “pattern”]
features for word after the token we want to tag
[“low”, “title”, “upper”, “pos”, “pos2”]
max_iterations: 50
L1_c: 1
L2_c: 1e-3
name: “ner_synonyms”
name: “intent_featurizer_count_vectors”
name: “intent_classifier_tensorflow_embedding”
and I face when I add “word2” , “word3” in crf feature in the pipline I face this error
akelad
(Akela Drissner)
September 3, 2018, 8:03am
2
We made some changes to the features in ner_crf, this is the list of available ones now:
# The maximum number of iterations for optimization algorithms.
"max_iterations": 50,
# weight of theL1 regularization
"L1_c": 0.1,
# weight of the L2 regularization
"L2_c": 0.1
}
function_dict = {
'low': lambda doc: doc[0].lower(),
'title': lambda doc: doc[0].istitle(),
'prefix5': lambda doc: doc[0][:5],
'prefix2': lambda doc: doc[0][:2],
'suffix5': lambda doc: doc[0][-5:],
'suffix3': lambda doc: doc[0][-3:],
'suffix2': lambda doc: doc[0][-2:],
'suffix1': lambda doc: doc[0][-1:],
'pos': lambda doc: doc[1],
'pos2': lambda doc: doc[1][:2],
The standard ones used are listed here: http://rasa.com/docs/nlu/components/#ner-crf
I have used the latest version of rasa_nlu and I have also updated the dictionary but all in vain, I am facing the same issue.
akelad
(Akela Drissner)
September 4, 2018, 9:30am
4
Updated the dictionary where? Can you post your config file in a legible format please?
language: "en"
pipeline:
- name: "nlp_spacy"
model: "en"
- name: "tokenizer_spacy"
- name: "ner_spacy"
- name: "ner_duckling"
- name: "ner_duckling_http"
dimensions:
- "NUMBER"
- name: "ner_crf"
BILOU_flag: true
features:
# features for word before token
- ["low", "title", "upper", "pos", "pos2"]
# features of token itself
- ["bias", "low", "upper","word3", "word2" "title", "digit", "pos", "pos2", "pattern"]
# features for word after the token we want to tag
- ["low", "title", "upper", "pos", "pos2"]
max_iterations: 50
L1_c: 1
L2_c: 1e-3
- name: "ner_synonyms"
- name: "intent_featurizer_count_vectors"
- name: "intent_classifier_tensorflow_embedding"
akelad
(Akela Drissner)
September 5, 2018, 2:38pm
6
Yes so you need to remove all the word3
, word2
etc features that aren’t part of the available features for the new version anymore.
These are the standard ones, that I linked previously http://rasa.com/docs/nlu/components/#ner-crf
Can you tell why it is not update in git ?
language: "en"
pipeline:
- name: "nlp_spacy"
model: "en"
- name: "tokenizer_spacy"
- name: "ner_spacy"
- name: "ner_duckling_http"
url: "http://duckling:8000"
dimensions:
- "NUMBER"
- name: "ner_crf"
BILOU_flag: true
features:
# features for word before token
- ["low", "title", "upper", "pos", "pos2"]
# features of token itself
- ["bias", "low", "word3", "word2", "upper", "title", "digit", "pos", "pos2", "pattern"]
# features for word after the token we want to tag
- ["low", "title", "upper", "pos", "pos2"]
This file has been truncated. show original
akelad
(Akela Drissner)
September 6, 2018, 12:37pm
8
looks like we forgot to update that, feel free to create a PR to fix it
Can you tell where I can where I can find help, I would like to know what this features means, like “pos”, “title” etc.
as mentioned in spacy config http://rasa.com/docs/nlu/components/#ner-crf
1 Like
@akelad can you help where to find the features meaning ?
souvikg10
(Souvik Ghosh)
September 6, 2018, 2:21pm
12
1 Like