Is it necessary to have equal number of examples for every intent
It is a question of distribution. Machine learning is nothing but a probabilistic model.
If you have an imbalanced dataset, the probability of the machine understanding one intent with many examples and one with few can be skewed. Ideally your dataset should be balanced but that doesn’t mean absolutely 20 examples per intent. However there should be a mean.
So if this is the case. Suppose I am a question that can be asked in 10 ways and another question is there that can be asked in 100 ways. Then what should be my approach
there aren’t usually a lot of ways of asking a question or users replying to a particular question. The NLU problem isn’t just mathematical or technical, the trick of a conversation is also psychological and is driven with a purpose unless you want to build a machine that is like a human being.
You should build NLU with a task driven approach. what is the problem you are trying to solve?
if you are getting such a big difference between what users might ask then you are not approaching the problem correctly. you shouldn’t have such big difference, i could imagine one having 20 ways while another 40 which won’t come out as a big problem but 10 and 100 seems oddly large.
From my experience the number of examples imports less than the semantic width (no better term) that are in them. You can have 200 examples generated with chatito and have poor results, and 50 carefully crafted or gathered example encompassing the semantic space or width that will perfrom much better