NLU detects random input with wrong intent and high confidence

If I provide random input text like “…” , “???”,"-=-%$^" or “qwerty”, Rasa NLU returns very good confidence with any intent.

Such words are not part of any of the training data in any intent, yet NLU returns intents and confidence level being higher than threshold.

Can you please guide how to fix this ? I believe such words should be with lower confidence and go in side fallback intent.

4 Likes

Hi, can you please provide more info,

what is your pipeline? do you have a garbage intent?

I don’t have garbage intent. I tried below pipelines

pipeline:
  - name: "nlp_spacy"
  - name: "tokenizer_spacy"
  - name: "intent_featurizer_spacy"
  - name: "intent_classifier_sklearn"
  - name: "ner_crf"
  - name: "ner_synonyms"

Also recently I tried with below pipeline as well, got the same result

language: "en"

pipeline: "tensorflow_embedding"

which version are you using? as of 0.13.0 tensorflow embedding will predict None for the intent if there are no in-vocab words, see the changelog https://github.com/RasaHQ/rasa_nlu/blob/master/CHANGELOG.rst#0130---2018-08-02

My Current Rasa NLU version is 0.13.1 , yet facing the issue.

@amn41Can you set it to specific treshold when None is returned. If just one in-vocab word is used maybe this makes sense.

what’s the issue exactly?

I’m not sure I understand what you mean - you mean defining a threshold on the number of out-of-vocab words?

yes

@amn41 NLU returns high confidence with any intent ( higher than threshold ) for random words like “???” , “qwerty” etc. It should go in None or detect low confidence (lower than threshold). We are discussion on how to achieve that.

Hmm I’ve never experienced this before, I always get the None intent for random words. @Ghostvv any ideas what’s happening?

Are you sure you don’t have at least one in-vocab word like the for example?

Yes the words “–” , “-” ,"<" , “@” alone are not part of training data, they may be used in training statements with combination of other words or characters.

All Intents have training statements ending with “.” character. If I type “.” , the bot matches with an intent.

sorry, it is hard to understand this way. Could you please compose example script, so we could reproduce this issue?

Hi @Ghostvv , I created a sample project with 2 intents. Please find code links below. I tested by passing “???”,“qwerty” and “?” characters and it is detecting wrong intent with 0.7 - 0.8 confidence. I tested with Rasa NLU as server. Thanks.

What version of rasa_nlu are you using? I tried with master, the classifier returns None

I am using 0.13.1 version of Rasa NLU. Is Master stable to use ? My project is soon going to be in production phase.

I am using 0.13.1 version of Rasa NLU. Is Master stable to use ? My project is soon going to be in production phase.

I just tried in fresh virtual environment with rasa_nlu version 0.13.1, and I got:

INFO:tensorflow:Restoring parameters from projects/default/tf_model/intent_classifier_tensorflow_embedding.ckpt
{'intent': {'name': None, 'confidence': 0.0}, 'entities': [], 'intent_ranking': [], 'text': '???'}
{'intent': {'name': None, 'confidence': 0.0}, 'entities': [], 'intent_ranking': [], 'text': 'qwerty'}

Are you sure you use 0.13.1, could you please try creating new virtual environment with 0.13.1?