Error in intent classification

boutaina · March 24, 2020, 5:34pm

Hello , please can anyone help me , Im using Rasa for months now and I started having an issue recently , I actually built a model based on 5 different intents (with different amount of data) and the model is working fine when I use same words from data BUT as soon as I try a new sentence which normally is not in the scope , it gives me the intent ‘small_talk’ which is one of the 5 intents with a very high confidence . That’s noot logical so can anyone tell me what should I do . Should I change the hyperparameters of change the classifier (Im using the embedding classifier) . Thank you !

Ghostvv · March 25, 2020, 10:56am

what version of Rasa Open Source and what config are you using?

boutaina · April 2, 2020, 9:38am

I have the 1.8.2 version of Rasa with this config file pipeline:

name: “WhitespaceTokenizer_omran_ar” case_sensitive: false
name: “CountVectorsFeaturizer”
name: “EmbeddingIntentClassifier” batch_strategy: sequence
name: “Extracteur_omran”

policies:

name: EmbeddingPolicy max_history: 5 epochs: 200 batch_size: 50
name: “MemoizationPolicy” max_history: 5
name: “FallbackPolicy” nlu_threshold: 0.4 core_threshold: 0.3
name: MappingPolicy

Ghostvv · April 2, 2020, 10:12am

Given your config the behavior is very logical. CountVectorsFeaturizer creates vocabulary from the training data. If your input sentence consist of unseen words, it ignores them. Please take a look here Components at Handling Out-Of-Vocabulary (OOV) words: or try using some pretrained embeddings in addition: Choosing a Pipeline

boutaina · April 2, 2020, 12:38pm

Im sorry but I just read the article about handling oov and I still don’t get it , I tried this solution where I added the word oov in the data of a new intent ‘outofcontext’ and added this line - name: “intent_featurizer_count_vectors” OOV_token: oov as I saw in a post . Did I do something wrong or is the solution not complete cause I dont really get the other solution about pretrained embeddings

Ghostvv · April 2, 2020, 12:45pm

sounds correct, I would add a couple of examples with different amount of oov words

boutaina · April 3, 2020, 12:59pm

I still have the same error actually ,I actually added the oov words in the data I’ll show u some I think there is something wrong cause it’s still not working a lot of insignificant sentences get associated to other intents even if I added the ‘outofcontext’ intent that contains data {“text”: “peux tu oov”, “intent”: “outofcontext”,} , {“text”: “oov oov”,“intent”: “outofcontext”, } can you please give any kind of advice to make it right please ?

Ghostvv · April 3, 2020, 1:31pm

did adding oov improve the situation?

boutaina · April 6, 2020, 1:20pm

No it didn’t , it’s still associating new phrases to other intents is there any other solution or is it not working because I didn’t put much data in the ooc intent ?

Ghostvv · April 6, 2020, 9:33pm

I understand it didn’t solve the problem, but do you see any improvements?

boutaina · April 7, 2020, 4:29pm

No improvements , I added more data and still no improvement

Topic		Replies	Views
Empty String for Intent Classification Rasa Open Source	7	1369	February 4, 2021
Use of Out of Vocabulary - OOV Rasa Open Source	9	3444	December 22, 2021
OOV-token for tensorflow embedding Rasa Open Source	3	1606	October 12, 2018
The response intent is wrong Rasa Open Source	12	1413	June 9, 2023
Customize OOV_token in CountVectorsFeaturizer? Rasa Open Source	1	1206	October 17, 2019

Error in intent classification

Related topics