my latest project is a chatbot in Chinese. This should only answer simple FAQs. I already have the first intents and stories together. But now I am missing the right config. I found some blog articles and repos (https://github.com/crownpku/Rasa_NLU_Chi, Non-English Tools for Rasa NLU | The Rasa Blog | Rasa), but I couldn’t do anything with them (e.g. I do not have the file data/total_word_feature_extractor_zh.dat and other problems)
I wanted to ask if someone could give me a simple, simple config for a FAQ bot in Chinese.
rasa version:
Rasa Version : 3.0.6
Minimum Compatible Version: 3.0.0
Rasa SDK Version : 3.0.4
Rasa X Version : None
Python Version : 3.7.2
Operating System : Darwin-20.6.0-x86_64-i386-64bit
Python Path : /Users/theresa/.pyenv/versions/3.7.2/bin/python3.7
@threxx - I am not sure if you will find an optimized pipeline for chinese that just works out of the box so you will have to finetune it for your data. Here’s a simple one that worked for me for intent classification.
p.s this is simplified chinese
# Configuration for Rasa NLU.
# https://rasa.com/docs/rasa/nlu/components/
language: zh
pipeline:
- name: JiebaTokenizer
- name: RegexFeaturizer
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
OOV_token: "oov"
- name: CountVectorsFeaturizer
analyzer: char_wb
min_ngram: 1
max_ngram: 4
- name: DIETClassifier
epochs: 100
- name: duckling_extractor.duckling.DucklingEntityExtractor
# url of the running duckling server
url: "http://localhost:8000"
# dimensions to extract
dimensions:
[
"time",
"number",
"amount-of-money",
"distance",
"sys-number",
"sys-currency",
]
# if not set the default timezone of Duckling is going to be used
# needed to calculate dates from relative expressions like "tomorrow"
timezone: "Europe/Berlin"
# Timeout for receiving response from http url of the running duckling server
# if not set the default timeout of duckling http url is set to 3 seconds.
timeout: 3
locale: "zh_ZH"
- name: FallbackClassifier
threshold: 0.3
ambiguity_threshold: 0.1
policies:
# # No configuration for policies was provided. The following default policies were used to train your model.
# # If you'd like to customize them, uncomment and adjust the policies.
# # See https://rasa.com/docs/rasa/policies for more information.
# - name: MemoizationPolicy
# - name: TEDPolicy
# max_history: 5
# epochs: 100
# constrain_similarities: true
# - name: RulePolicy