Rasa CountVectorsFeaturizer

anojl · August 23, 2021, 9:25am

Hello community! I am using the component CountVectorsFeaturizer for word featurization in my pipeline and i came across to the fact that it has a parameter “OOV_token” that can deal with words that the model didnt see in the training process , so i created an Intent “INT_oov” where i put some random sentences and injected “oov” tokens do that whevever the model see words that were not seen in the training data they will be considered as oov token and therefore predicted as INT_oov , the problem is that i think it made the predection quality very bad after using this approach , because the model if it finds one word that hasn’t been in the training data it consider it automatically as INT_oov intent even tho the rest of the words in the sentence can be classified in other relevant intents. Am I using the “oov_token” wrong ? And what is the best way to deal with unseen words in sentences during prediction ? Thank you

Topic		Replies	Views
Use of Out of Vocabulary - OOV Rasa Open Source	9	3427	December 22, 2021
Customize OOV_token in CountVectorsFeaturizer? Rasa Open Source	1	1200	October 17, 2019
Error in intent classification Rasa Open Source	10	682	April 7, 2020
OOV token not found in NLU Rasa Open Source	3	1291	May 6, 2020
How to log sentences containing oov words and explicitely mark oov words? Rasa Open Source	9	1939	December 18, 2018

Rasa CountVectorsFeaturizer

Related topics