The language of my system is Chinese. Most of the training samples are classified correctly, but a few samples return an intent as null. The configuration is as follows:
language: “zh” pipeline:
- name: “nlp_mitie” model: “data/total_word_feature_extractor_zh.dat”
- name: “tokenizer_jieba” dictionary_path: “data/userdict.txt”
- name: “intent_featurizer_count_vectors”
- name: “intent_classifier_tensorflow_embedding”
- name: “ner_mitie”
- name: “ner_synonyms”
The model failed to classify the sample “在吗” which means “are you there” and “是” which means “yes”. The model “data/total_word_feature_extractor_zh.dat” is not trained based on my corpus and I just copy it from another system, is it the problem?