UnicodeEncodeError: 'ascii' codec can't encode character '\U0001f916' in position 22: ordinal not in range(128)

Hello.

After I upgraded my rasa core/nlu/sdk on my linux server, I get the following message when running “rasa init”:

tograssm@vps:/var/www# rasa init
Traceback (most recent call last):
  File "/usr/local/bin/rasa", line 11, in <module>
    load_entry_point('rasa', 'console_scripts', 'rasa')()
  File "/usr/src/rasa/rasa_nlu/rasa/__main__.py", line 68, in main
    cmdline_arguments.func(cmdline_arguments)
  File "/usr/src/rasa/rasa_nlu/rasa/cli/scaffold.py", line 152, in run
    print_success("Welcome to Rasa! \U0001f916\n")
  File "/usr/src/rasa/rasa_nlu/rasa/cli/utils.py", line 144, in print_success
    print_color(*args, color=bcolors.OKGREEN)
  File "/usr/src/rasa/rasa_nlu/rasa/cli/utils.py", line 163, in print_color
    print (wrap_with_color(*args, color=color))
UnicodeEncodeError: 'ascii' codec can't encode character '\U0001f916' in position 22: ordinal not in range(128)

How can I fix this

Can you try setting this environment variable in your shell before rerunning the command: PYTHONIOENCODING='utf8'

If this doesn’t work, you’ll have to set it on each command:

PYTHONIOENCODING='utf8' rasa init
5 Likes

I am getting the same error, what should I do for Windows?

On Ubuntu, This worked for me.

(venv) root@om:/workspace/nnn(master)# export PYTHONIOENCODING=‘utf8’ (venv) root@om:/workspace/nnn(master)# rasa init --no-prompt

Welcome to Rasa! :robot:

To get started quickly, an initial project will be created. If you need some help, check out the documentation at Introduction to Rasa Open Source.

Created project directory at ‘/workspace/’. Finished creating project structure. Training an initial model…

On Windows, many editors assume the default ANSI encoding (CP1252 on US Windows) instead of UTF-8 if there is no byte order mark (BOM) character at the start of the file. Files store bytes, which means all unicode have to be encoded into bytes before they can be stored in a file. read_csv takes an encoding option to deal with files in different formats. So, you have to specify an encoding, such as utf-8.

df.to_csv('D:\panda.csv',sep='\t',encoding='utf-8')

If you don’t specify an encoding, then the encoding used by df.tocsv defaults to ascii in Python2, or utf-8 in Python3.

Also, you can encode a problematic series first then decode it back to utf-8.

df['column-name'] = df['column-name'].map(lambda x: x.encode('unicode-escape').decode('utf-8'))

This will also rectify the problem.