Korean NLU

Hi everyone,

We were wondering if anyone has any experience using Rasa NLU in Korean? Specifically, dealing with tokenization as this is a little bit more complicated than just whitespace tokenization.

Would be great if you could share your experiences :smile:

Thanks, Akela



Me and my colleagues are currently dealing with that ! Please let me know if you need help


1 Like

@asnal05λ‹˜, ν˜Ήμ‹œ ν•œκ΅­μ–΄ NLU 잘 λ™μž‘ν•˜λ‚˜μš”?

Hi @asnal05,

This is Nari Kim, and we plan to implement an application for class project in Korean language. Have you implemented a custom pipeline for Korean (ν˜•νƒœμ†ŒλΆ„μ„κΈ°, etc)? Any information will be appreciated.

Thank you!

λ„€ 잘 μž‘λ™ν•©λ‹ˆλ‹€ :slight_smile:

1 Like

Yes we have implemented your own pipeline.

We have used Mecab for tokenization but it was not robust against OOV hence we’ve built another tokenizer which works better !

We have tried various pipelines for NLU.
First, we trained the basic CRF and StarSpace provided from RASA on the same dataset (i.e. nlu.md). But we have found that it maybe a better idea to use different sets of data for enttiy extraction (EE) and intent classification (IC). That was your second approach. We then found that train EE and IC JOINTLY could improve the accuracy and we obtained the β€œbest” result among the three pipelines.

1 Like

@asnal05 Thank you very much for the info! Did you share your code or any material online? It would be very helpful for us.

Thank you!

We would also be interested in seeing this if possible :slight_smile:

Dear @akelad and @Nari

Sorry for the late reply ! Unfortunately I am not allowed to release the code by the policy of our company.

1 Like

@asnal05 Oh, it was a company project! Understood. Thank you for the response.