I’m adding a Japanese support function to my bot, which means i have to clone my Rasa bot to make a new one in Japanese. I’m using the default pipeline “supervised_embedding” which use the Whitespace_Tokenizer. This tokenizer can not be used with language with no space such as Japanese, Chinese,… So i wonder if there is a tokenizer from Rasa that i can use to deal with this kind of language.
I read from the docs that the Jieba Tokenizer supports Chinese, can it be use on Japanese too or i have to make a custom tokenizer ? If the latter is true, does any one know a reliable tokenizer for Japanese that can be easily integrated with Rasa. Thank you.