NLU and class impallance

petasis · March 7, 2021, 11:36am

Hi all,

My NLU file is large. Not too many of intents, but intents with thousands of examples. The problem is that intents with few examples are no more detected. the intent “stop” contains the example “stop”, and the intent “deny” contains the example “no”. However, typing “stop” or “no” get a confidence score in intent classification as low as 0.2 (and nlu_fallback gets activated).

How to fix this problem?

I think a partial solution is to make some examples like stop or no map directly to an intent. Something like KeywordIntentClassifier, but with the ability to specify which intents to load (and not load everything). The other alternative is to up-sample somehow the examples of the minority classes.

mloubser · March 10, 2021, 4:08pm

The first solution to class imbalance is always to correct it by collecting more data for underrepresented intents.

If an intent is there just to be triggered by specific words, you can rather use buttons with hard-coded intents e.g. payload: /deny. You can correct some imbalance with sampling techniques or hyperparameter tuning, but an imbalance that large indicates something is wrong with the structure of the data itself.

How are you collecting examples for your intents? Do they come from real conversations, or are they synthetic?

petasis · March 12, 2021, 8:20am

@mloubser The source of imbalance is the FAQ handling. If you have 3000 topics, with 3-6 questions each, what can you do? How many alternatives regarding yes/no/stop can you gather? The questions to the FAQ are synthetic (I wrote them all), and they are handled by the response selector (with a success rate around 80%).

ivanmkc · June 1, 2021, 5:59am

I also have this problem due to an intent having multiple possible values for an entity, all of which are combined with the different ways to phrase the intent, leading to a combinatorial explosion. See Confusion in Using Entity Synonyms - #2 by ivanmkc

Eager to hear from Rasa what the answer is.

koaning · June 8, 2021, 12:08pm

Is there a hierarchy in your FAQ questions? Typically we have the example that “chitchat” is a different set of responses than “FAQ”. In your case it might also be possible to split up the FAQ questions into subgroups. Might that help?

In general; getting a score to properly represent certainty is a huge unsolved problem in ML. There’s an algorithm whiteboard video here that highlights recent work done from our research team on the topic. We recently introduced some hyperparameters for DIET that might help. There’s also a PyData talk here that highlights how estimated probabilities are not a great proxy for certainty, which might help explain why it’s a hard problem to get right.

Topic		Replies	Views
Strange misclassification of intent Rasa Open Source	6	1055	October 31, 2018
Trouble with intent classification Getting Started with Rasa	1	239	October 15, 2019
Class imbalance with faq intent Rasa Open Source	0	167	April 20, 2023
NLU - When testing, does every utterance have to match an intent Rasa Open Source	26	3724	April 1, 2019
Handling misclassification of intents Rasa Open Source	11	1074	August 18, 2021

NLU and class impallance

Related Topics