Any Chinese users here, I build my pipeline with configure below:
language: "zh"
pipeline:
- name: "JiebaTokenizer"
dictionary_path: "data/dict"
- name: RegexFeaturizer
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer
analyzer: char_wb
min_ngram: 1
max_ngram: 4
- name: DIETClassifier
epochs: 100
constrain_similarities: true
- name: EntitySynonymMapper
- name: ResponseSelector
epochs: 100
constrain_similarities: true
- name: FallbackClassifier
threshold: 0.7
ambiguity_threshold: 0.1
I have run it, it seems not good enough, some sentences can not been recognized. I have tried to replace CountVectorsFeaturizer with bert, but things not improve. I wish anyone can give me some suggestion to configure Chinese pipeline.