How many training examples do I need?

Hi everyone,

Now I am implementing the rasa model in korean. In this process, I plan to use own training data to create custom word embeddings, because the spaCy do not support a pre-trained word embedding for korean. So, how many training examples do I need for creating effective word embeddings?

Thanks, Young-Jun Lee

“It depends”.

What is your goal? To train your own language model that is general for the entire korean language? Or a language model that is good enough on a small subset of it such that it can be used for a chatbot?

The general language model usually needs “loads”. For a chatbot though, it is probably more important to understand what types of questions to expect in the first place and you can make do with less.

I can’t say much about the korean language, because I’ve got little experience with it, but from my own experience … a bot that has 20 examples per intent usually is good enough for a demo.

Also … the release notes of spaCy suggest that Korean support is here