How to use Japanese Text with Rasa (Mecab-Tokenization)

I’m about to working on Bots that support for both English and Japanese, along the way I face the problems with Japanese Text. I now use “tensorflow” as pipeline. I run everything as default then Bots seem to not working fine. I read the topics around and I came to know that Japanese needed special case. Default tokenization is not working, I need something like “MeCab”. I try to look around but cannot see a way how to really put them together.

Anyone could guide me a bit in this is really appreciate ??

You would need to create a custom tokenizer using that. Might be similar to the jieba one for chinese, take a look here: rasa_nlu/jieba_tokenizer.py at master · RasaHQ/rasa_nlu · GitHub

Thank you @akelad for your response. I have followed the jieba_tokenizer, and I just eliminated some parts which I consider it not exist in Mecab, example Dictionary.

I finally got this Custom Component (Mecab) along with this configuration config.yaml.

I rantraining data as normal, I got it successful trained, and I tried to do prediction but It seem that to be false. All the prediction is return as “None”.

I think there is some problem with Text Tranformation, but cannot fix it out yet.

I link complete code, here, maybe you could try it out. I think I tokenization is the problem but I already try my best.

Hope to fix it out here, I really want RASA to work with Japanese Text

Thank you

I have added a custom component using Mecab tokenizer. It works fine for me for Japanese text.

Link:https://github.com/mahbubcseju/Rasa_Japanese