UnicodeDecodeError: 'gbk' codec can't decode byte 0x9c in position 72: illegal multibyte sequence

Hi,

I used lookup table in my taining data as follows: “lookup_tables”: [ { “name”: “road”, “elements”: “data/cuisine_list.json” } ],

and have original Chinese characters in cuisine_list.json with ‘utf-8’,

image

and I did nlu trainining, I always encounted the error information when intent_entity_featurizer_regex: UnicodeDecodeError: ‘gbk’ codec can’t decode byte 0x9c in position 72: illegal multibyte sequence

Colud you help me to solve the problem? Thanks.

Juven

python: 3.6.8 rasa-nlu: 0.14.4 rasa-core: 0.13.2 rasa-core-sdk: 0.12.1 tensorflow 1.12.0

Hey @juven, can you post the whole stack trace of the error? Also, does this only happens when you use lookup tables?