UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 1498: invalid start byte

I have installed rasa (1.2.7) on my conda environment on Windows and everything has been working well so far. Then I have decided to install rasa X using the following command:

pip install rasa-x --extra-index-url https://pypi.rasa.com/simple

and then run:

rasa x

and got the following error:

(botenv) D:\git\agent\rasa x Traceback (most recent call last): File “c:\programdata\anaconda3\envs\botenv\lib\runpy.py”, line 193, in run_module_as_main “main”, mod_spec) File “c:\programdata\anaconda3\envs\botenv\lib\runpy.py”, line 85, in run_code exec(code, run_globals) File "C:\ProgramData\Anaconda3\envs\botenv\Scripts\rasa.exe_main.py", line 9, in File "c:\programdata\anaconda3\envs\botenv\lib\site-packages\rasa_main.py", line 76, in main cmdline_arguments.func(cmdline_arguments) File “c:\programdata\anaconda3\envs\botenv\lib\site-packages\rasa\cli\x.py”, line 291, in rasa_x run_locally(args) File “c:\programdata\anaconda3\envs\botenv\lib\site-packages\rasa\cli\x.py”, line 315, in run_locally _validate_rasa_x_start(args, project_path) File “c:\programdata\anaconda3\envs\botenv\lib\site-packages\rasa\cli\x.py”, line 262, in _validate_rasa_x_start _validate_domain(os.path.join(project_path, DEFAULT_DOMAIN_PATH)) File “c:\programdata\anaconda3\envs\botenv\lib\site-packages\rasa\cli\x.py”, line 275, in _validate_domain Domain.load(domain_path) File “c:\programdata\anaconda3\envs\botenv\lib\site-packages\rasa\core\domain.py”, line 62, in load other = cls.from_path(path) File “c:\programdata\anaconda3\envs\botenv\lib\site-packages\rasa\core\domain.py”, line 72, in from_path domain = cls.from_file(path) File “c:\programdata\anaconda3\envs\botenv\lib\site-packages\rasa\core\domain.py”, line 85, in from_file return cls.from_yaml(rasa.utils.io.read_file(path)) File “c:\programdata\anaconda3\envs\botenv\lib\site-packages\rasa\utils\io.py”, line 131, in read_file return f.read() File “c:\programdata\anaconda3\envs\botenv\lib\codecs.py”, line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0x92 in position 1498: invalid start byte

Has someone experienced similar issue?

Thanks.

Hi @FrancisThibault, this is a known bug that we’re already working on fixing, should be fixed in the next release. It’s related to the domain being written in the default encoding instead of utf-8. Can you share your domain? It’s possible there might be a workaround in the meantime, not 100 percent sure though.

1 Like

Hi Ella, thank you so much for your quick reply.

Do I have to do a fresh Rasa installation since even Rasa is not working anymore. When you mention if I can share my domain, which domain are you referring to?

Thank you so much for your collaboration.

Hm, the original one – what usually happens in this bug is that upon start of rasa x, it will read your domain, re-format, and re-save it (unfortunately with the wrong encoding).

Can you run python 3.7? If so, i think this workaround will work:

Workaround: (Python 3.7+ only) set the environment variable PYTHONUTF8 to 1 before running rasa, this forces python to use utf8 as default encoding. On Windows: set PYTHONUTF8=1

from (Rasa X Decoding error with German umlauts · Issue #4151 · RasaHQ/rasa · GitHub)

I have python 3.7.3 and even if I set PYTHONUTF8=1 on Windows, I got the same error.

Thanks.

Can you try removing the domain file that was modified by rasa x and starting over with it? It could be that it’s still affecting it. You should be able to run rasa successfully before trying the workaround for Rasa X.

1 Like

You are right, I reverted my domain.yml file, and then restart rasa and it works now. So if you have any other ideas about how to use your rasa X afterwards, please let me know.

Thank you so much for you help. Very appreciated!

I retrain the model, and because the variable was set PYTHONUTF8=1, the domain.yml file has been regenerated, and then starting rasa x now works. Fantastic!

Thanks again

Awesome. Glad the workaround fixed it! This issue should be fixed overall (no variable necessary) whenever the next Rasa X release comes out.

Hi,

is there any hint what to do if facing this prolem with german umlaute while using docker? I’m pointing to rasa/rasa:latest-full but this image still seems to use python3.6 so I don’t know how to setup the workaround within the container.

Thanks, Sphin

@Sphin first of all, i’d definitely not recomment using latest, as i believe it comes from master and is therefore unstable. Right now, you probably want to use rasa/rasa:1.6.0-full.

can you post the full traceback when you get the error? Also are you sure your files are saved in utf-8 format?

Thanks for your fast reply and the hint with 1.6.0. Although I set UTF-8 as the default type my files had to be saved explicitly in UTF-8 and then the error has vanished. Thanks again.