Dialogflow migrating for Chinese Agent

wowwwkao · August 20, 2021, 10:06pm

Has anyone have success in migrating zh-CN based Dialogflow agent to Rasa?

When use default config the trained model appears to be able to detect the intent correctly, but not extracting entities like it would in Dialogflow.

Configs used as below:

language: zh-CN pipeline:

name: WhitespaceTokenizer
name: RegexFeaturizer
name: LexicalSyntacticFeaturizer
name: CountVectorsFeaturizer
name: CountVectorsFeaturizer analyzer: char_wb min_ngram: 1 max_ngram: 4
name: DIETClassifier epochs: 100 constrain_similarities: true
name: EntitySynonymMapper
name: ResponseSelector epochs: 100 constrain_similarities: true
name: FallbackClassifier threshold: 0.3 ambiguity_threshold: 0.1

The training process does show the following info:

/lib/python3.8/site-packages/rasa/shared/utils/io.py:97: UserWarning: Misaligned entity annotation in message ‘早上吃面条’ with intent ‘diabetestalk.agent.log_unknown_food’. Make sure the start and end values of entities ([(0, 2, ‘早上’), (2, 3, ‘吃’), (3, 5, ‘面条’)]) in the training data match the token boundaries ([(0, 5, ‘早上吃面条’)]). Common causes:

entities include trailing whitespaces or punctuation
the tokenizer gives an unexpected result, due to languages such as Chinese that don’t use whitespace for word separation More info at Training Data Format

Instead of the default white space tokenizer, I also tried to use the Chinese based tokenizer Jieba, below is my configs:

language: “zh” pipeline:

name: “MitieNLP” model: “data/total_word_feature_extractor_zh.dat”
name: “JiebaTokenizer”
name: “MitieEntityExtractor”
name: “EntitySynonymMapper”
name: “RegexFeaturizer”
name: “MitieFeaturizer”
name: “SklearnIntentClassifier”
name: ResponseSelector epochs: 100 constrain_similarities: true

This configuration can’t run the training at all, finished very fast and can’t detect any intent or entities.

Please help!

Dustyposa · August 23, 2021, 6:18am

You can use a empty project to verify you new config. If work fine, you need to check you corpus.

wowwwkao · August 29, 2021, 10:08pm

Thanks for the suggestion. When creating a new project, the default sample file is set up in English, yet the config that I’m trying to run with needs to work for Chinese. Any idea on how to create new project in Chinese?

Dustyposa · August 30, 2021, 1:09am

If the config work for english is ok, you only need to change the corpus to Chinese. You can refer the project https://github.com/Dustyposa/rasa_ch_faq. used bert.

Topic		Replies	Views
Misaligned entity annotation Rasa Open Source	7	4614	June 3, 2020
Config for FAQ Bot in Chinese Rasa Open Source	3	989	May 26, 2023
Errors when migrating from Dialogflow to Rasa Rasa Open Source	1	1085	September 30, 2019
[HELP NEEDED] Misaligned entity annotation in message Rasa Open Source	6	1839	September 13, 2022
Adding a tokenizer to a predefined pipeline(for languages like Chinese) Rasa Open Source	1	606	May 21, 2019

Dialogflow migrating for Chinese Agent

Related topics