Now I am implementing the rasa model in korean. In this process, I plan to use own training data to create custom word embeddings, because the spaCy do not support a pre-trained word embedding for korean. So, how many training examples do I need for creating effective word embeddings?
What is your goal? To train your own language model that is general for the entire korean language? Or a language model that is good enough on a small subset of it such that it can be used for a chatbot?
The general language model usually needs “loads”. For a chatbot though, it is probably more important to understand what types of questions to expect in the first place and you can make do with less.
I can’t say much about the korean language, because I’ve got little experience with it, but from my own experience … a bot that has 20 examples per intent usually is good enough for a demo.