UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 1498: invalid start byte

I have installed rasa (1.2.7) on my conda environment on Windows and everything has been working well so far. Then I have decided to install rasa X using the following command:

pip install rasa-x --extra-index-url https://pypi.rasa.com/simple

and then run:

rasa x

and got the following error:

(botenv) D:\git\agent\rasa x Traceback (most recent call last): File “c:\programdata\anaconda3\envs\botenv\lib\runpy.py”, line 193, in run_module_as_main “main”, mod_spec) File “c:\programdata\anaconda3\envs\botenv\lib\runpy.py”, line 85, in run_code exec(code, run_globals) File "C:\ProgramData\Anaconda3\envs\botenv\Scripts\rasa.exe_main.py", line 9, in File "c:\programdata\anaconda3\envs\botenv\lib\site-packages\rasa_main.py", line 76, in main cmdline_arguments.func(cmdline_arguments) File “c:\programdata\anaconda3\envs\botenv\lib\site-packages\rasa\cli\x.py”, line 291, in rasa_x run_locally(args) File “c:\programdata\anaconda3\envs\botenv\lib\site-packages\rasa\cli\x.py”, line 315, in run_locally _validate_rasa_x_start(args, project_path) File “c:\programdata\anaconda3\envs\botenv\lib\site-packages\rasa\cli\x.py”, line 262, in _validate_rasa_x_start _validate_domain(os.path.join(project_path, DEFAULT_DOMAIN_PATH)) File “c:\programdata\anaconda3\envs\botenv\lib\site-packages\rasa\cli\x.py”, line 275, in _validate_domain Domain.load(domain_path) File “c:\programdata\anaconda3\envs\botenv\lib\site-packages\rasa\core\domain.py”, line 62, in load other = cls.from_path(path) File “c:\programdata\anaconda3\envs\botenv\lib\site-packages\rasa\core\domain.py”, line 72, in from_path domain = cls.from_file(path) File “c:\programdata\anaconda3\envs\botenv\lib\site-packages\rasa\core\domain.py”, line 85, in from_file return cls.from_yaml(rasa.utils.io.read_file(path)) File “c:\programdata\anaconda3\envs\botenv\lib\site-packages\rasa\utils\io.py”, line 131, in read_file return f.read() File “c:\programdata\anaconda3\envs\botenv\lib\codecs.py”, line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0x92 in position 1498: invalid start byte

Has someone experienced similar issue?

Thanks.

Hi @FrancisThibault, this is a known bug that we’re already working on fixing, should be fixed in the next release. It’s related to the domain being written in the default encoding instead of utf-8. Can you share your domain? It’s possible there might be a workaround in the meantime, not 100 percent sure though.

Hi Ella, thank you so much for your quick reply.

Do I have to do a fresh Rasa installation since even Rasa is not working anymore. When you mention if I can share my domain, which domain are you referring to?

Thank you so much for your collaboration.

Hm, the original one – what usually happens in this bug is that upon start of rasa x, it will read your domain, re-format, and re-save it (unfortunately with the wrong encoding).

Can you run python 3.7? If so, i think this workaround will work:

Workaround: (Python 3.7+ only) set the environment variable PYTHONUTF8 to 1 before running rasa, this forces python to use utf8 as default encoding. On Windows: set PYTHONUTF8=1

from (Rasa X Decoding error with German umlauts · Issue #4151 · RasaHQ/rasa · GitHub)

I have python 3.7.3 and even if I set PYTHONUTF8=1 on Windows, I got the same error.

Thanks.

Can you try removing the domain file that was modified by rasa x and starting over with it? It could be that it’s still affecting it. You should be able to run rasa successfully before trying the workaround for Rasa X.

You are right, I reverted my domain.yml file, and then restart rasa and it works now. So if you have any other ideas about how to use your rasa X afterwards, please let me know.

Thank you so much for you help. Very appreciated!

I retrain the model, and because the variable was set PYTHONUTF8=1, the domain.yml file has been regenerated, and then starting rasa x now works. Fantastic!

Thanks again

Awesome. Glad the workaround fixed it! This issue should be fixed overall (no variable necessary) whenever the next Rasa X release comes out.

Hi,

is there any hint what to do if facing this prolem with german umlaute while using docker? I’m pointing to rasa/rasa:latest-full but this image still seems to use python3.6 so I don’t know how to setup the workaround within the container.

Thanks, Sphin

@Sphin first of all, i’d definitely not recomment using latest, as i believe it comes from master and is therefore unstable. Right now, you probably want to use rasa/rasa:1.6.0-full.

can you post the full traceback when you get the error? Also are you sure your files are saved in utf-8 format?

Thanks for your fast reply and the hint with 1.6.0. Although I set UTF-8 as the default type my files had to be saved explicitly in UTF-8 and then the error has vanished. Thanks again.

The problem will be solved just run: pip3 install --upgrade rasa == 2.4.3