How to use Japanese Text with Rasa (Mecab-Tokenization)

khut · October 25, 2018, 8:18am

I’m about to working on Bots that support for both English and Japanese, along the way I face the problems with Japanese Text. I now use “tensorflow” as pipeline. I run everything as default then Bots seem to not working fine. I read the topics around and I came to know that Japanese needed special case. Default tokenization is not working, I need something like “MeCab”. I try to look around but cannot see a way how to really put them together.

Anyone could guide me a bit in this is really appreciate ??

akelad · November 2, 2018, 12:59pm

You would need to create a custom tokenizer using that. Might be similar to the jieba one for chinese, take a look here: rasa_nlu/jieba_tokenizer.py at master · RasaHQ/rasa_nlu · GitHub

khut · November 3, 2018, 9:40am

Thank you @akelad for your response. I have followed the jieba_tokenizer, and I just eliminated some parts which I consider it not exist in Mecab, example Dictionary.

I finally got this Custom Component (Mecab) along with this configuration config.yaml.

I rantraining data as normal, I got it successful trained, and I tried to do prediction but It seem that to be false. All the prediction is return as “None”.

I think there is some problem with Text Tranformation, but cannot fix it out yet.

I link complete code, here, maybe you could try it out. I think I tokenization is the problem but I already try my best.

Hope to fix it out here, I really want RASA to work with Japanese Text

Thank you

mahbubcseju · July 4, 2019, 7:45am

I have added a custom component using Mecab tokenizer. It works fine for me for Japanese text.

Link:https://github.com/mahbubcseju/Rasa_Japanese

Topic		Replies	Views
Rasa for Japanese language Rasa Open Source	6	1122	October 25, 2021
Tokenizer for language without space such as Japanese Rasa Open Source	7	1995	October 20, 2020
Does Rasa support Japanese? Rasa Open Source	4	1852	October 26, 2018
How does Rasa NLU Intent Classification use Tokenization? Rasa Open Source	1	739	April 15, 2019
Korean NLU Rasa Open Source	9	2161	October 7, 2019

How to use Japanese Text with Rasa (Mecab-Tokenization)

Related topics