Unable to train a model with training data that contains non ascii characters

andy51002000 · May 24, 2019, 1:30am

I got the error message after I post a request to my rasa server

{ “error”: “ascii” }

My Training data

language: zh

pipeline:

name: tokenizer_whitespace
name: intent_entity_featurizer_regex
name: ner_crf
name: ner_synonyms
name: intent_featurizer_count_vectors
name: intent_classifier_tensorflow_embedding data: |

intent:wifi_not_working

我的電腦 WIFI 打不開
我的 WIFI 不能用
我的 WIFI 有問題
我不能用 WIFI

intent:find_repair_center

如何找到東京維修處
如何找到東京都維修處
如何找到維修中心
維修中心在哪裡
最近的維修中心在哪裡

I found if I remove round brackets “(” and “)”, everything works fine. I really need them to let Rasa know what entities are, so I cannot just remove. Anyone has idea to fix this problem?

BlankRain · May 24, 2019, 8:04am

maybe you need this one: GitHub - crownpku/Rasa_NLU_Chi: Turn Chinese natural language into structured data 中文自然语言理解

Topic		Replies	Views
Dealing with Non-ascii characters Feedback on Rasa Open Source	24	1724	November 18, 2021
Rasa train raises Unicode encodeerror: 'ASCII' codec can't encode characters in position 9-10: ordinal not in range (128) Rasa Open Source	2	767	April 27, 2020
Empty entities being returned by rasa nlu Rasa Open Source	5	1702	April 23, 2020
Chinese whitespace error Rasa Open Source	2	963	August 5, 2022
Rasa not picking special characters in an entity Rasa Open Source	9	3338	May 12, 2020

Unable to train a model with training data that contains non ascii characters

intent:wifi_not_working

intent:find_repair_center

Related topics