Error in intent classification

Hello , please can anyone help me , Im using Rasa for months now and I started having an issue recently , I actually built a model based on 5 different intents (with different amount of data) and the model is working fine when I use same words from data BUT as soon as I try a new sentence which normally is not in the scope , it gives me the intent ‘small_talk’ which is one of the 5 intents with a very high confidence . That’s noot logical so can anyone tell me what should I do . Should I change the hyperparameters of change the classifier (Im using the embedding classifier) . Thank you !

what version of Rasa Open Source and what config are you using?

I have the 1.8.2 version of Rasa with this config file pipeline:

  • name: “WhitespaceTokenizer_omran_ar” case_sensitive: false
  • name: “CountVectorsFeaturizer”
  • name: “EmbeddingIntentClassifier” batch_strategy: sequence
  • name: “Extracteur_omran”

policies:

  • name: EmbeddingPolicy max_history: 5 epochs: 200 batch_size: 50
  • name: “MemoizationPolicy” max_history: 5
  • name: “FallbackPolicy” nlu_threshold: 0.4 core_threshold: 0.3
  • name: MappingPolicy

Given your config the behavior is very logical. CountVectorsFeaturizer creates vocabulary from the training data. If your input sentence consist of unseen words, it ignores them. Please take a look here Components at Handling Out-Of-Vocabulary (OOV) words: or try using some pretrained embeddings in addition: Choosing a Pipeline

Im sorry but I just read the article about handling oov and I still don’t get it , I tried this solution where I added the word oov in the data of a new intent ‘outofcontext’ and added this line - name: “intent_featurizer_count_vectors” OOV_token: oov as I saw in a post . Did I do something wrong or is the solution not complete cause I dont really get the other solution about pretrained embeddings

sounds correct, I would add a couple of examples with different amount of oov words

I still have the same error actually ,I actually added the oov words in the data I’ll show u some I think there is something wrong cause it’s still not working a lot of insignificant sentences get associated to other intents even if I added the ‘outofcontext’ intent that contains data {“text”: “peux tu oov”, “intent”: “outofcontext”,} , {“text”: “oov oov”,“intent”: “outofcontext”, } can you please give any kind of advice to make it right please ?

did adding oov improve the situation?

No it didn’t , it’s still associating new phrases to other intents is there any other solution or is it not working because I didn’t put much data in the ooc intent ?

I understand it didn’t solve the problem, but do you see any improvements?

No improvements , I added more data and still no improvement