Empty String for Intent Classification

kmegalokonomos · November 4, 2020, 12:56am

Hello everyone. Suppose we have an empty string for classification how is this supposedly being handled and what confidence/intent would be returned, if we are using a CountVectorFeaturizer?

Possible examples could be the following:

A user writes a single word that happens to be a stop word
A user writes an out of vocabulary word and the setting for out of vocabulary words is to ignore them

Currently in my setup all such cases go to the same intent and actually with a pretty good confidence (around 0.80). Is this expected? And how would I care for such cases?

Thank u in advance for your help

kmegalokonomos · November 4, 2020, 8:31am

Good morning, I noticed in the documentation the following:

“If during prediction time a message contains only words unseen during training and no Out-Of-Vocabulary preprocessor was used, an empty intent None is predicted with confidence 0.0 .”

This is actually what I wanted. But this is not what is happening! It actually predicts a random intent with a pretty high confidence so I cant even use fallback. Any thoughts on whats happening?

dakshvar22 · November 6, 2020, 9:02am

Hi @kmegalokonomos, can you please share your pipeline configuration here?

kmegalokonomos · November 7, 2020, 8:05am

Hello @dakshvar22,

This is one of the pipelines, but to tell you the truth I have tried all sort of variations to try and identify where the problem is:

pipeline:

name: SpacyNLP model: el_core_news_lg case_sensitive: false
name: SpacyTokenizer token_pattern: (?u)\b\w+\b
name: LexicalSyntacticFeaturizer token_pattern: (?u)\b\w+\b
name: CountVectorsFeaturizer
name: CountVectorsFeaturizer analyzer: char_wb min_ngram: 5 max_ngram: 8
name: DIETClassifier epochs: 100
name: EntitySynonymMapper
name: ResponseSelector retrieval_intent: faq epochs: 100
name: ResponseSelector retrieval_intent: chitchat epochs: 100
name: FallbackClassifier threshold: 0.4 ambiguity_threshold: 0.1

All the best, Konstantinos

Yasmine · February 2, 2021, 2:33pm

Hey Konstantinos,

I’m running into the same issue, did you find out the problem?

In my case, I have two retrieval intents (faq and chitchat) like you, and when I provide a random input even with arabic letters Rasa NLU returns a very good confidence (eg: 0.99…) which is incorrect because I don’t have an intent with arabic letters.

Thanks!

kmegalokonomos · February 3, 2021, 2:06pm

Hello Yasmin, no unfortunately I didn’t find out why this was happening. It still is happening. However I created an out_of_scope intent and entered an out of vocabulary token “OOV” that now classifies this stuff under the new intent.

All the best, Konstantinos

Yasmine · February 3, 2021, 6:07pm

Hey Konstantinos,

Thank you for your reply.

Maybe some hyperparameters tuning could be a good solution. Also, I found this post which could be helpful: NLU classifies unseen utterances to a random intent with too high confidence

kmegalokonomos · February 4, 2021, 8:42pm

Yep, saw this as well, interesting!

Topic		Replies	Views
Rasa classifies random input as intents with high probability Rasa Open Source	24	1587	April 20, 2023
Combining Keyword classification with DIETClassifier and Fallback Rasa Open Source	0	1235	March 30, 2022
Failing at intent classification Rasa Open Source	4	796	August 5, 2019
Rasa detect random meaningless sequence of character as intent with high confidence Rasa Open Source	2	1021	December 10, 2019
Features for the SKLearnIntentClassifier Rasa Open Source	2	440	March 23, 2020

Empty String for Intent Classification

Related topics