Error when using CountVectorFeaturizer (Rasa Open Source 2.2.4)

I just update for the 2.2.4 version of Rasa and I’m getting the following error when training a model:

2021-01-12 13:39:31 WARNING rasa.nlu.featurizers.sparse_featurizer.count_vectors_featurizer - Unable to train CountVectorizer for message attribute text since the call to sklearn’s .fit() method failed. Leaving an untrained CountVectorizer for it.

Traceback (most recent call last): File “/usr/local/bin/rasa”, line 8, in sys.exit(main()) File “/usr/local/lib/python3.6/dist-packages/rasa/main.py”, line 116, in main cmdline_arguments.func(cmdline_arguments)

File “/usr/local/lib/python3.6/dist-packages/rasa/cli/train.py”, line 205, in train_nlu finetuning_epoch_fraction=args.epoch_fraction,

File “/usr/local/lib/python3.6/dist-packages/rasa/train.py”, line 711, in train_nlu finetuning_epoch_fraction=finetuning_epoch_fraction,

File “/usr/local/lib/python3.6/dist-packages/rasa/utils/common.py”, line 308, in run_in_loop result = loop.run_until_complete(f)

File “uvloop/loop.pyx”, line 1456, in uvloop.loop.Loop.run_until_complete

File “/usr/local/lib/python3.6/dist-packages/rasa/train.py”, line 757, in _train_nlu_async finetuning_epoch_fraction=finetuning_epoch_fraction,

File “/usr/local/lib/python3.6/dist-packages/rasa/train.py”, line 818, in _train_nlu_with_validated_data **additional_arguments,

File “/usr/local/lib/python3.6/dist-packages/rasa/nlu/train.py”, line 116, in train interpreter = trainer.train(training_data, **kwargs)

File “/usr/local/lib/python3.6/dist-packages/rasa/nlu/model.py”, line 209, in train updates = component.train(working_data, self.config, **context)

File “/usr/local/lib/python3.6/dist-packages/rasa/nlu/featurizers/sparse_featurizer/count_vectors_featurizer.py”, line 728, in train self._train_with_independent_vocab(attribute_texts)

File “/usr/local/lib/python3.6/dist-packages/rasa/nlu/featurizers/sparse_featurizer/count_vectors_featurizer.py”, line 542, in _train_with_independent_vocab attribute, attribute_texts[attribute]

File “/usr/local/lib/python3.6/dist-packages/rasa/nlu/featurizers/sparse_featurizer/count_vectors_featurizer.py”, line 608, in _fit_vectorizer_from_scratch self._add_buffer_to_vocabulary(attribute)

File “/usr/local/lib/python3.6/dist-packages/rasa/nlu/featurizers/sparse_featurizer/count_vectors_featurizer.py”, line 447, in add_buffer_to_vocabulary original_vocabulary = self.vectorizers[attribute].vocabulary AttributeError: ‘CountVectorizer’ object has no attribute ‘vocabulary_’

The config.yml I’m using is the following:

language: “pt” # your two-letter language code

pipeline:

  • name: WhitespaceTokenizer
  • name: RegexFeaturizer
  • name: LexicalSyntacticFeaturizer
  • name: CountVectorsFeaturizer use_lemma: False strip_accents: True
  • name: CountVectorsFeaturizer analyzer: “char_wb” min_ngram: 1 max_ngram: 5 use_lemma: False
  • name: DIETClassifier epochs: 100 regularization_constant: 0.005 learning_rate: 0.0001 batch_strategy: “sequence”

Can someone help me with this problem? I couldn’t figure it out and also couldn’t find much on the internet.

Thanks for the help in advance!

Nathan

I’m facing the same issue here.

It looks like it’s an error with scikit learn. It can’t find a vocabulary being provided, but based on the docs if one isn’t given it should be inferred from the input documents and it looks like that isn’t happening.

Are you both working in Portuguese? I’m wondering if it could be a language-specific thing.

Hello, Rachael! Thanks for you answer.

Yes! I’m using Portuguese. Is there any possible fix for it? It has been reported before?

Nathan

I’ve been digging around and I can’t find any other reports of it… would you mind filing a GitHub issue? :pray: https://github.com/RasaHQ/rasa/issues/new/choose