I am having trouble using the data converter available on rasa cli due to some characters and symbols frequently used in brazilian portuguese. I already have a corpus in json that works well on previous Rasa versions. As I am starting to adapt all my routines to be compatible with newer versions (>2.0.0), I tried using the following cli command:
rasa data convert nlu -f md--data=old_data --out=data
(At first the conversion is to ‘md’ because it seems like I need an ‘md’ file to convert to yaml.)
Things go OK and a new file is generated and seems well parsed, except from the special characters probably due to encoding changes in the middle of the proccess. The output file is then filled with strings as ‘\u00e1’ for example for some special characters.
I tried the variations below with no success:
rasa data convert nlu -f yaml --data=old_data --out=data -l pt
rasa data convert nlu -f yaml --data=old_data --out=data -l pt-br
Does anyone know a fix for this? Anyone went through this? I want to know if there is an easy way to deal with this before developing my own parser.