Config for FAQ Bot in Chinese

@threxx - I am not sure if you will find an optimized pipeline for chinese that just works out of the box so you will have to finetune it for your data. Here’s a simple one that worked for me for intent classification.

p.s this is simplified chinese

# Configuration for Rasa NLU.
# https://rasa.com/docs/rasa/nlu/components/
language: zh

pipeline:
  - name: JiebaTokenizer
  - name: RegexFeaturizer
  - name: LexicalSyntacticFeaturizer
  - name: CountVectorsFeaturizer
    OOV_token: "oov"
  - name: CountVectorsFeaturizer
    analyzer: char_wb
    min_ngram: 1
    max_ngram: 4
  - name: DIETClassifier
    epochs: 100
  - name: duckling_extractor.duckling.DucklingEntityExtractor
    # url of the running duckling server
    url: "http://localhost:8000"
    # dimensions to extract
    dimensions:
      [
        "time",
        "number",
        "amount-of-money",
        "distance",
        "sys-number",
        "sys-currency",
      ]
    # if not set the default timezone of Duckling is going to be used
    # needed to calculate dates from relative expressions like "tomorrow"
    timezone: "Europe/Berlin"
    # Timeout for receiving response from http url of the running duckling server
    # if not set the default timeout of duckling http url is set to 3 seconds.
    timeout: 3
    locale: "zh_ZH"
  - name: FallbackClassifier
    threshold: 0.3
    ambiguity_threshold: 0.1
policies:
# # No configuration for policies was provided. The following default policies were used to train your model.
# # If you'd like to customize them, uncomment and adjust the policies.
# # See https://rasa.com/docs/rasa/policies for more information.
#   - name: MemoizationPolicy
#   - name: TEDPolicy
#     max_history: 5
#     epochs: 100
#     constrain_similarities: true
#   - name: RulePolicy
2 Likes