Hello everyone. Suppose we have an empty string for classification how is this supposedly being handled and what confidence/intent would be returned, if we are using a CountVectorFeaturizer?
Possible examples could be the following:
A user writes a single word that happens to be a stop word
A user writes an out of vocabulary word and the setting for out of vocabulary words is to ignore them
Currently in my setup all such cases go to the same intent and actually with a pretty good confidence (around 0.80). Is this expected? And how would I care for such cases?
Good morning, I noticed in the documentation the following:
“If during prediction time a message contains only words unseen during training and no Out-Of-Vocabulary preprocessor was used, an empty intent None is predicted with confidence 0.0 .”
This is actually what I wanted. But this is not what is happening! It actually predicts a random intent with a pretty high confidence so I cant even use fallback. Any thoughts on whats happening?
I’m running into the same issue, did you find out the problem?
In my case, I have two retrieval intents (faq and chitchat) like you, and when I provide a random input even with arabic letters Rasa NLU returns a very good confidence (eg: 0.99…) which is incorrect because I don’t have an intent with arabic letters.
Hello Yasmin, no unfortunately I didn’t find out why this was happening. It still is happening.
However I created an out_of_scope intent and entered an out of vocabulary token “OOV” that now classifies this stuff under the new intent.