Using NLP for Training data selection

Since for training the NLU with sentences you need to take care of the balances and variations of examples to avoid overfitting. When you are in practice and have many data you don’t know in general what kind of sentences you already have in your data. In my mind came the idea to do Clustering of new sentences to look if I should add the new sentences to the data.

Do you use some algorithms for training data yet. What kind?

1 Like