Rasa detect random meaningless sequence of character as intent with high confidence

My bot support two languages ( english and arabic ) when I give bot text like

  • for english : “dfdfkjdf dfdkfjdf” “dfdfdf” “dfdfdkjdfs dfsdfsdf”
  • for arabic : “بسييبلبنيتلاليب يبلنبيتالينبلاتيب " " يسبمسنيت بمسينتبسميبتيسبتسنيبتسيب”

detect these examples as intent I have in my training data with high confidence !! I don’t have any of this data in my training data

// english pipeline pipeline :

- name: “pretrained_embeddings_spacy”

  • name : “SpacyNLP”
  • name: “SpacyTokenizer”
  • name: “SpacyFeaturizer”
  • name: “RegexFeaturizer”
  • name: “CRFEntityExtractor”
  • name: “EntitySynonymMapper”
  • name: “SklearnIntentClassifier”
  • name: “DucklingHTTPExtractor”

// arabic pipeline pipeline :

- name: “pretrained_embeddings_spacy”

  • name: “SpacyNLP”
  • name: “SpacyTokenizer”
  • name: “SpacyFeaturizer”
  • name: “RegexFeaturizer”
  • name: “CRFEntityExtractor”
  • name: “EntitySynonymMapper”
  • name: “SklearnIntentClassifier”
  • name: “DucklingHTTPExtractor”

Any help ?

spacy assigns a zero vector to these meaningless sequences and classify it to 1 of intents. Remember, we solve classification problem, so the algorithm will always assign some class to the phrase. One of possible solutions is to create out-of-scope intent with some gibberish examples

something like this :

intent:out_of_scope

  • lkflsdkjfsdf

  • sdfdffdfsdfsdf

  • sdfsdfsdflk sdflksdjfskdf sdflkjfsdlfjdsfsdf

  • slkfwerpoitwe.,nzxxzx m,.rwrposdf,sdf

  • pewrpewirpowemslnf swdflkfjl lkklfmxzxc,mwep;orweopri