With tensorflow embedding I created a small intent greet with just 5 examples like hi and hello. Now, this seems to disturb my intents. I have 5 major intents where one has 400 examples and the rest only about 100. Now, the word with
gets classified as greet intent? Although this word is not conatined within the greet intent? Can this be a result of imbalance?
yes, you should definitely not have this kind of class imbalance. 100 vs 400 is still fine, but 5 vs 100 is a problem
Thanks. But what if you have not so many examples for greet. maybe you have not 100 examples for greet. What is good practice then (like oversampling)?
We have an open sourced data set for the intent greet Good practice is to add more examples, there’s no work around right now unless you implement some custom machine learning model here
I just wonder and like if this would be possible, if you could train same components on different data sets in same pipeline. Can you handover the training data for each component? How would I do that?
Then I would create a separate intent data set like for greet and classify it seperately Aftrwards I run the other component for the other intents and compare the confidence to decide which intent to take.
hmm can you elaborate a bit what you mean? Training data for NLU or Core?
I mean trainng NLU component each on different training sets, like one for entity and one for intent. Afterwards in a custom component you add both results to latest_message object with entity and intent.
Furthermore, since you might have imbalances for one intent like greet, you get missclassifications. I would handle this in the way that I also train NLU on a greet data set separately. Afterwards I ceck again in a component how to associate which intent, like greet or another intent (coming from NLU on rest of data) using confidence.
I think this should be a good way. Using first an intent on just greet data would give very low confidence if not similiar to the greet data. Afterwards I can decide which intent to use.