UnicodeDecodeError

**Hi all, I am building a Chinese chatbot and have been trying to train my own model (without any pretrained embeddings) using the command ‘rasa train’, after completing my domain, stories, nlu files. However, I kept running into this error and would like to know if anyone knows how to solve this? the error statement said character maps to undefined…

UnicodeDecodeError: ‘charmap’ codec can’t decode byte 0x81 in position 288: character maps to

Hey there @limxuanhui. What type of terminal do you use? What we want is that the python i/o encoding is utf-8, not charmap. Can you please try:

Option 1: In your terminal before using rasa, try:

export PYTHONIOENCODING='utf8'

Option 2: put it as part of the command, like

PYTHONIOENCODING='utf8' rasa train

if neither of these work, can you post the full error traceback?

1 Like

Thanks for replying. I am using command prompt in windows. Below is the full error traceback, and I have tried both the suggested options, but I am not sure how to handle the errors regarding

  • ‘export not recognized as an internal or external command, operable program or batch file.’ and
  • ‘pythonioencoding not recognized as an internal or external command, operable program or batch file.’ For the error with export, I tried ‘set’ as well to set the environment variable PYTHONIOENCODING to utf-8 but it doesnt work as well…

Traceback (most recent call last): File “C:\Users\limxu\AppData\Local\Programs\Python\Python36\lib\runpy.py”, line 193, in run_module_as_main “main”, mod_spec) File “C:\Users\limxu\AppData\Local\Programs\Python\Python36\lib\runpy.py”, line 85, in run_code exec(code, run_globals) File "C:\Users\limxu\AppData\Local\Programs\Python\Python36\Scripts\rasa.exe_main.py", line 9, in File "C:\Users\limxu\AppData\Local\Programs\Python\Python36\lib\site-packages\rasa_main.py", line 70, in main cmdline_arguments.func(cmdline_arguments) File “C:\Users\limxu\AppData\Local\Programs\Python\Python36\lib\site-packages\rasa\cli\train.py”, line 84, in train kwargs=extract_additional_arguments(args), File “C:\Users\limxu\AppData\Local\Programs\Python\Python36\lib\site-packages\rasa\train.py”, line 40, in train kwargs=kwargs, File “C:\Users\limxu\AppData\Local\Programs\Python\Python36\lib\asyncio\base_events.py”, line 484, in run_until_complete return future.result() File “C:\Users\limxu\AppData\Local\Programs\Python\Python36\lib\site-packages\rasa\train.py”, line 135, in train_async kwargs=kwargs, File “C:\Users\limxu\AppData\Local\Programs\Python\Python36\lib\site-packages\rasa\train.py”, line 188, in _do_training fixed_model_name=fixed_model_name, File “C:\Users\limxu\AppData\Local\Programs\Python\Python36\lib\site-packages\rasa\train.py”, line 382, in _train_nlu_with_validated_data config, nlu_data_directory, _train_path, fixed_model_name=“nlu” File “C:\Users\limxu\AppData\Local\Programs\Python\Python36\lib\site-packages\rasa\nlu\train.py”, line 89, in train interpreter = trainer.train(training_data, **kwargs) File “C:\Users\limxu\AppData\Local\Programs\Python\Python36\lib\site-packages\rasa\nlu\model.py”, line 192, in train updates = component.train(working_data, self.config, **context) File “C:\Users\limxu\AppData\Local\Programs\Python\Python36\lib\site-packages\rasa_contrib\nlu\extractors\bilstm_crf_tf_entity_extractor.py”, line 58, in train config = model.get_default_config() File “C:\Users\limxu\AppData\Local\Programs\Python\Python36\lib\site-packages\seq2annotation\model.py”, line 105, in get_default_config encoding=None).tolist() File “C:\Users\limxu\AppData\Local\Programs\Python\Python36\lib\site-packages\numpy\lib\npyio.py”, line 1093, in loadtxt first_line = next(fh) File “C:\Users\limxu\AppData\Local\Programs\Python\Python36\lib\encodings\cp1252.py”, line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: ‘charmap’ codec can’t decode byte 0x81 in position 288: character maps to

Hm okay. Is the second answer here (the one with 67 upvotes) with installing win-unicode-console helpful at all? Python, Unicode, and the Windows console - Stack Overflow

Hi,

it did not work well, and after changing my computer language region to China, i am encountering another problem:

UnicodeDecodeError: ‘gbk’ codec can’t decode byte 0x80 in position 5635: illegal multibyte sequence

Any idea how to go about this?

Thank you.

Hm so now you’re encoding with gbk and not charmap, but that still won’t be picked up correctly by utf-8 i think.

Is there more of a traceback that you can show me that led to the most recent illegal multibyte sequence error?

Hi thanks for replying. I apologise for the late reply, but below is my traceback to the error I encountered when my system tried to encode with gbk.

Traceback (most recent call last): File “C:\Users\limxu\AppData\Local\Programs\Python\Python36\lib\runpy.py”, line 193, in run_module_as_main “main”, mod_spec) File “C:\Users\limxu\AppData\Local\Programs\Python\Python36\lib\runpy.py”, line 85, in run_code exec(code, run_globals) File "C:\Users\limxu\AppData\Local\Programs\Python\Python36\Scripts\rasa.exe_main.py", line 9, in File "C:\Users\limxu\AppData\Local\Programs\Python\Python36\lib\site-packages\rasa_main.py", line 70, in main cmdline_arguments.func(cmdline_arguments) File “C:\Users\limxu\AppData\Local\Programs\Python\Python36\lib\site-packages\rasa\cli\train.py”, line 84, in train kwargs=extract_additional_arguments(args), File “C:\Users\limxu\AppData\Local\Programs\Python\Python36\lib\site-packages\rasa\train.py”, line 40, in train kwargs=kwargs, File “C:\Users\limxu\AppData\Local\Programs\Python\Python36\lib\asyncio\base_events.py”, line 484, in run_until_complete return future.result() File “C:\Users\limxu\AppData\Local\Programs\Python\Python36\lib\site-packages\rasa\train.py”, line 135, in train_async kwargs=kwargs, File “C:\Users\limxu\AppData\Local\Programs\Python\Python36\lib\site-packages\rasa\train.py”, line 188, in _do_training fixed_model_name=fixed_model_name, File “C:\Users\limxu\AppData\Local\Programs\Python\Python36\lib\site-packages\rasa\train.py”, line 382, in _train_nlu_with_validated_data config, nlu_data_directory, _train_path, fixed_model_name=“nlu” File “C:\Users\limxu\AppData\Local\Programs\Python\Python36\lib\site-packages\rasa\nlu\train.py”, line 89, in train interpreter = trainer.train(training_data, **kwargs) File “C:\Users\limxu\AppData\Local\Programs\Python\Python36\lib\site-packages\rasa\nlu\model.py”, line 192, in train updates = component.train(working_data, self.config, **context) File “C:\Users\limxu\AppData\Local\Programs\Python\Python36\lib\site-packages\rasa_contrib\nlu\extractors\bilstm_crf_tf_entity_extractor.py”, line 58, in train config = model.get_default_config() File “C:\Users\limxu\AppData\Local\Programs\Python\Python36\lib\site-packages\seq2annotation\model.py”, line 105, in get_default_config encoding=None).tolist() File “C:\Users\limxu\AppData\Local\Programs\Python\Python36\lib\site-packages\numpy\lib\npyio.py”, line 1093, in loadtxt first_line = next(fh) UnicodeDecodeError: ‘gbk’ codec can’t decode byte 0x80 in position 5635: illegal multibyte sequence