We were wondering if anyone has any experience using Rasa NLU in Korean? Specifically, dealing with tokenization as this is a little bit more complicated than just whitespace tokenization.
Would be great if you could share your experiences
Me and my colleagues are currently dealing with that !
Please let me know if you need help
@asnal05님, 혹시 한국어 NLU 잘 동작하나요?
This is Nari Kim, and we plan to implement an application for
class project in Korean language.
Have you implemented a custom pipeline for Korean (형태소분석기, etc)?
Any information will be appreciated.
Yes we have implemented your own pipeline.
We have used Mecab for tokenization but it was not robust against OOV hence we’ve built another tokenizer which works better !
We have tried various pipelines for NLU.
First, we trained the basic CRF and StarSpace provided from RASA on the same dataset (i.e. nlu.md). But we have found that it maybe a better idea to use different sets of data for enttiy extraction (EE) and intent classification (IC). That was your second approach. We then found that train EE and IC JOINTLY could improve the accuracy and we obtained the “best” result among the three pipelines.
@asnal05 Thank you very much for the info!
Did you share your code or any material online? It would be very helpful for us.
We would also be interested in seeing this if possible
Dear @akelad and @Nari
Sorry for the late reply !
Unfortunately I am not allowed to release the code by the policy of our company.
@asnal05 Oh, it was a company project! Understood. Thank you for the response.