I am having trouble using the data converter available on rasa cli due to some characters and symbols frequently used in brazilian portuguese. I already have a corpus in json that works well on previous Rasa versions. As I am starting to adapt all my routines to be compatible with newer versions (>2.0.0), I tried using the following cli command:
rasa data convert nlu -f md--data=old_data --out=data
(At first the conversion is to ‘md’ because it seems like I need an ‘md’ file to convert to yaml.)
Things go OK and a new file is generated and seems well parsed, except from the special characters probably due to encoding changes in the middle of the proccess. The output file is then filled with strings as ‘\u00e1’ for example for some special characters.
I tried the variations below with no success:
rasa data convert nlu -f yaml --data=old_data --out=data -l pt
rasa data convert nlu -f yaml --data=old_data --out=data -l pt-br
Does anyone know a fix for this? Anyone went through this? I want to know if there is an easy way to deal with this before developing my own parser.
Hey Marcos! Thank you for the answer. I also used the official guides for migration of my project, but I wonder if it’s a particular aspect of the json part of the parser. Mind if I ask what format were your files before the migration?
Maybe ‘md’ to ‘yaml’ doesn’t mess with encoding (as a matter of fact, my stories were nicely parsed using the rasa cli with no encoding issues), but maybe only the ‘json’ to ‘md’ functionality have this issue?
, yeah, I do use linux and I didn’t even remember asking you about OS you might be using! If it is related, I have no idea, but I’d suggest to google about this, I’m pretty sure you’ll find something @bayesianwannabe. Sorry to not help you with W10…
No problems at all! I am sorry for so many questions. I wonder if you are being able to use rasa x normally on linux with brazilian portuguese sentences with special characters, as I am also having an encoding problem as I posted here: