Empty String for Intent Classification

Hello everyone. Suppose we have an empty string for classification how is this supposedly being handled and what confidence/intent would be returned, if we are using a CountVectorFeaturizer?

Possible examples could be the following:

  • A user writes a single word that happens to be a stop word
  • A user writes an out of vocabulary word and the setting for out of vocabulary words is to ignore them

Currently in my setup all such cases go to the same intent and actually with a pretty good confidence (around 0.80). Is this expected? And how would I care for such cases?

Thank u in advance for your help

Good morning, I noticed in the documentation the following:

“If during prediction time a message contains only words unseen during training and no Out-Of-Vocabulary preprocessor was used, an empty intent None is predicted with confidence 0.0 .”

This is actually what I wanted. But this is not what is happening! It actually predicts a random intent with a pretty high confidence so I cant even use fallback. Any thoughts on whats happening?

Hi @kmegalokonomos, can you please share your pipeline configuration here?

Hello @dakshvar22,

This is one of the pipelines, but to tell you the truth I have tried all sort of variations to try and identify where the problem is:

pipeline:

  • name: SpacyNLP model: el_core_news_lg case_sensitive: false
  • name: SpacyTokenizer token_pattern: (?u)\b\w+\b
  • name: LexicalSyntacticFeaturizer token_pattern: (?u)\b\w+\b
  • name: CountVectorsFeaturizer
  • name: CountVectorsFeaturizer analyzer: char_wb min_ngram: 5 max_ngram: 8
  • name: DIETClassifier epochs: 100
  • name: EntitySynonymMapper
  • name: ResponseSelector retrieval_intent: faq epochs: 100
  • name: ResponseSelector retrieval_intent: chitchat epochs: 100
  • name: FallbackClassifier threshold: 0.4 ambiguity_threshold: 0.1

All the best, Konstantinos

Hey Konstantinos,

I’m running into the same issue, did you find out the problem?

In my case, I have two retrieval intents (faq and chitchat) like you, and when I provide a random input even with arabic letters Rasa NLU returns a very good confidence (eg: 0.99…) which is incorrect because I don’t have an intent with arabic letters.

Thanks!

Hello Yasmin, no unfortunately I didn’t find out why this was happening. It still is happening. However I created an out_of_scope intent and entered an out of vocabulary token “OOV” that now classifies this stuff under the new intent.

All the best, Konstantinos

Hey Konstantinos,

Thank you for your reply.

Maybe some hyperparameters tuning could be a good solution. Also, I found this post which could be helpful: NLU classifies unseen utterances to a random intent with too high confidence

Yep, saw this as well, interesting!