Error when using CountVectorFeaturizer (Rasa Open Source 2.2.4)

nathanformentin · January 12, 2021, 1:54pm

I just update for the 2.2.4 version of Rasa and I’m getting the following error when training a model:

2021-01-12 13:39:31 WARNING rasa.nlu.featurizers.sparse_featurizer.count_vectors_featurizer - Unable to train CountVectorizer for message attribute text since the call to sklearn’s .fit() method failed. Leaving an untrained CountVectorizer for it.

Traceback (most recent call last): File “/usr/local/bin/rasa”, line 8, in sys.exit(main()) File “/usr/local/lib/python3.6/dist-packages/rasa/main.py”, line 116, in main cmdline_arguments.func(cmdline_arguments)

File “/usr/local/lib/python3.6/dist-packages/rasa/cli/train.py”, line 205, in train_nlu finetuning_epoch_fraction=args.epoch_fraction,

File “/usr/local/lib/python3.6/dist-packages/rasa/train.py”, line 711, in train_nlu finetuning_epoch_fraction=finetuning_epoch_fraction,

File “/usr/local/lib/python3.6/dist-packages/rasa/utils/common.py”, line 308, in run_in_loop result = loop.run_until_complete(f)

File “uvloop/loop.pyx”, line 1456, in uvloop.loop.Loop.run_until_complete

File “/usr/local/lib/python3.6/dist-packages/rasa/train.py”, line 757, in _train_nlu_async finetuning_epoch_fraction=finetuning_epoch_fraction,

File “/usr/local/lib/python3.6/dist-packages/rasa/train.py”, line 818, in _train_nlu_with_validated_data **additional_arguments,

File “/usr/local/lib/python3.6/dist-packages/rasa/nlu/train.py”, line 116, in train interpreter = trainer.train(training_data, **kwargs)

File “/usr/local/lib/python3.6/dist-packages/rasa/nlu/model.py”, line 209, in train updates = component.train(working_data, self.config, **context)

File “/usr/local/lib/python3.6/dist-packages/rasa/nlu/featurizers/sparse_featurizer/count_vectors_featurizer.py”, line 728, in train self._train_with_independent_vocab(attribute_texts)

File “/usr/local/lib/python3.6/dist-packages/rasa/nlu/featurizers/sparse_featurizer/count_vectors_featurizer.py”, line 542, in _train_with_independent_vocab attribute, attribute_texts[attribute]

File “/usr/local/lib/python3.6/dist-packages/rasa/nlu/featurizers/sparse_featurizer/count_vectors_featurizer.py”, line 608, in _fit_vectorizer_from_scratch self._add_buffer_to_vocabulary(attribute)

File “/usr/local/lib/python3.6/dist-packages/rasa/nlu/featurizers/sparse_featurizer/count_vectors_featurizer.py”, line 447, in add_buffer_to_vocabulary original_vocabulary = self.vectorizers[attribute].vocabulary AttributeError: ‘CountVectorizer’ object has no attribute ‘vocabulary_’

The config.yml I’m using is the following:

language: “pt” # your two-letter language code

pipeline:

name: WhitespaceTokenizer
name: RegexFeaturizer
name: LexicalSyntacticFeaturizer
name: CountVectorsFeaturizer use_lemma: False strip_accents: True
name: CountVectorsFeaturizer analyzer: “char_wb” min_ngram: 1 max_ngram: 5 use_lemma: False
name: DIETClassifier epochs: 100 regularization_constant: 0.005 learning_rate: 0.0001 batch_strategy: “sequence”

Can someone help me with this problem? I couldn’t figure it out and also couldn’t find much on the internet.

Thanks for the help in advance!

Nathan

dgslv · January 13, 2021, 6:48pm

I’m facing the same issue here.

rctatman · January 15, 2021, 4:24pm

It looks like it’s an error with scikit learn. It can’t find a vocabulary being provided, but based on the docs if one isn’t given it should be inferred from the input documents and it looks like that isn’t happening.

Are you both working in Portuguese? I’m wondering if it could be a language-specific thing.

nathanformentin · January 15, 2021, 5:28pm

Hello, Rachael! Thanks for you answer.

Yes! I’m using Portuguese. Is there any possible fix for it? It has been reported before?

Nathan

rctatman · January 15, 2021, 9:56pm

I’ve been digging around and I can’t find any other reports of it… would you mind filing a GitHub issue? https://github.com/RasaHQ/rasa/issues/new/choose

IwonaW · April 7, 2021, 10:10am

Hi All! I get a similar error with: “…‘CountVectorizer’ object has no attribute ‘vocabulary_’”. I has it only I change in my config file 1 parametr: min_ngram. When I have “min_ngram=1” works correct, but when I train with more value than one I get errors. Anyone know how you can train a model with min_ngram larger than 1? Thank You, Iwona

Topic		Replies	Views
CountVectorizer warnings Rasa Open Source	5	1188	April 3, 2020
Unable to train CountVectorizer for message attribute response Rasa Open Source	2	1222	September 16, 2019
sklearn.exceptions.NotFittedError: Vocabulary not fitted or provided Getting Started with Rasa	2	288	October 9, 2020
A question about twice CountVectorsFeaturizer entry in supervised_embedding pipeline recipe Rasa Open Source	1	1218	October 15, 2019
WhitespaceTokenizer ignored from pipeline Rasa Open Source	0	356	April 17, 2022

Error when using CountVectorFeaturizer (Rasa Open Source 2.2.4)

Related topics