Can I add list of stops words in CountVectorsFeaturizer.py file

Can I add list of spacy’s stopwords in the predefined CountVectorsFeaturizer.py file like this:

self.stop_words = self.component_config[‘stop_words’]

    spacy_stopwords = spacy.lang.en.stop_words.STOP_WORDS

    spacystopwords=[]

    for i in spacy_stopwords:

        spacystopwords.append(i)

    self.stop_words = spacystopwords

    for stopword in self.stop_words:

        print("printing the stop words",stopword)

you can pass in stop words to the CountVectorizer class sklearn.feature_extraction.text.CountVectorizer — scikit-learn 0.21.3 documentation

Hi, Alan!

Can I somehow add a stopwords list file in a separate file in the bot’s folder and then refer to it in the config file instead of putting the whole list into the config file?

that’s not a feature we support out of the box, but you could create a custom component to load the list from a file! The easiest way might be to subclass the CountVectorizer

1 Like