Hi, I’m currently building a chatbot using Rasa-NLU and using ner_crf as entity classifier in the pipeline.
I’m having around half a million training sentences with only 12 different entities. The extraction is going well but the recognition is not that accurate…
I’m trying to find why…
The pipeline is as folow: language: “fr”
pipeline:
- name: “components.preprocess.PrepareString”
- name: “nlp_spacy”
- name: “tokenizer_spacy”
- name: “ner_crf” features: [[“low”], [“bias”, “suffix3”], [“upper”, “pos”, “pos2”]]
- name: “ner_synonyms”
- name: “intent_featurizer_count_vectors”
- name: “intent_classifier_tensorflow_embedding”
I believe that it may be due to my none understanding of the features on ner_crf… Could someone explain to me what are the different features for ?
For example:
- low
- title
- suffix5
- suffix3
- suffix2
- suffix1
- pos
- pos2
- prefix5
- prefix2
- bias
- upper
- digit