We were wondering if anyone has any experience using Rasa NLU in Korean? Specifically, dealing with tokenization as this is a little bit more complicated than just whitespace tokenization.
Would be great if you could share your experiences
This is Nari Kim, and we plan to implement an application for
class project in Korean language.
Have you implemented a custom pipeline for Korean (ννμλΆμκΈ°, etc)?
Any information will be appreciated.
We have used Mecab for tokenization but it was not robust against OOV hence weβve built another tokenizer which works better !
We have tried various pipelines for NLU.
First, we trained the basic CRF and StarSpace provided from RASA on the same dataset (i.e. nlu.md). But we have found that it maybe a better idea to use different sets of data for enttiy extraction (EE) and intent classification (IC). That was your second approach. We then found that train EE and IC JOINTLY could improve the accuracy and we obtained the βbestβ result among the three pipelines.