sklearn.exceptions.NotFittedError: Vocabulary not fitted or provided

Hi all,

I have defined a pipeline in rasa 2.0.0rc4. All components seem to work, except CountVectorsFeaturizer (with words):

pipeline:
  - name: packages.LanguageDetection.LanguageDetection
  - name: HFTransformersNLP
    # Name of the language model to use
    model_name: "bert"
    # Pre-Trained weights to be loaded
    #model_weights: "nlpaueb/bert-base-greek-uncased-v1"
    model_weights: "bert-base-multilingual-uncased"
    cache_dir: packages/langdata
    alias: "embeddings"
  - name: LanguageModelTokenizer
    # Flag to check whether to split intents
    intent_tokenization_flag: False
    # Symbol on which intent should be split
    intent_split_symbol: "_"
  - name: LanguageModelFeaturizer
    alias: "lmf"
  - name: RegexFeaturizer
    # Text will be processed with case sensitive as default
    case_sensitive: True
    alias: "rf"
  - name: CountVectorsFeaturizer
    analyzer: "char_wb"
    min_ngram: 1
    max_ngram: 4
    use_lemma: False
    # Set the out-of-vocabulary token
    OOV_token: "_oov_"
    # Whether to use a shared vocab
    use_shared_vocab: False
    alias: "cvf_c"
  - name: RegexEntityExtractor
  **- name: CountVectorsFeaturizer**
**    alias: "cvf_w"**
  - name: DIETClassifier
    epochs: 50
    random_seed: 20212020
  - name: EntitySynonymMapper
  - name: ResponseSelector
    epochs: 50
    random_seed: 20212020
    featurizers: ["cvf_w", "lmf"]
  - name: FallbackClassifier
    threshold: 0.4
    ambiguity_threshold: 0.1

The error I get is:

Traceback (most recent call last):
  File "/home/pepper/.local/bin/rasa", line 8, in <module>
    sys.exit(main())
  File "/home/pepper/.local/lib/python3.8/site-packages/rasa/__main__.py", line 116, in main
    cmdline_arguments.func(cmdline_arguments)
  File "/home/pepper/.local/lib/python3.8/site-packages/rasa/cli/train.py", line 81, in train
    return rasa.train(
  File "/home/pepper/.local/lib/python3.8/site-packages/rasa/train.py", line 43, in train
    return rasa.utils.common.run_in_loop(
  File "/home/pepper/.local/lib/python3.8/site-packages/rasa/utils/common.py", line 300, in run_in_loop
    result = loop.run_until_complete(f)
  File "uvloop/loop.pyx", line 1456, in uvloop.loop.Loop.run_until_complete
  File "/home/pepper/.local/lib/python3.8/site-packages/rasa/train.py", line 102, in train_async
    return await _train_async_internal(
  File "/home/pepper/.local/lib/python3.8/site-packages/rasa/train.py", line 198, in _train_async_internal
    await _do_training(
  File "/home/pepper/.local/lib/python3.8/site-packages/rasa/train.py", line 256, in _do_training
    await _train_core_with_validated_data(
  File "/home/pepper/.local/lib/python3.8/site-packages/rasa/train.py", line 403, in _train_core_with_validated_data
    await rasa.core.train(
  File "/home/pepper/.local/lib/python3.8/site-packages/rasa/core/train.py", line 67, in train
    agent.train(training_data, **additional_arguments)
  File "/home/pepper/.local/lib/python3.8/site-packages/rasa/core/agent.py", line 723, in train
    self.policy_ensemble.train(
  File "/home/pepper/.local/lib/python3.8/site-packages/rasa/core/policies/ensemble.py", line 188, in train
    policy.train(
  File "/home/pepper/.local/lib/python3.8/site-packages/rasa/core/policies/ted_policy.py", line 331, in train
    tracker_state_features, label_ids = self.featurize_for_training(
  File "/home/pepper/.local/lib/python3.8/site-packages/rasa/core/policies/policy.py", line 164, in featurize_for_training
    state_features, label_ids = self.featurizer.featurize_trackers(
  File "/home/pepper/.local/lib/python3.8/site-packages/rasa/core/featurizers/tracker_featurizers.py", line 140, in featurize_trackers
    tracker_state_features = self._featurize_states(trackers_as_states, interpreter)
  File "/home/pepper/.local/lib/python3.8/site-packages/rasa/core/featurizers/tracker_featurizers.py", line 68, in _featurize_states
    return [
  File "/home/pepper/.local/lib/python3.8/site-packages/rasa/core/featurizers/tracker_featurizers.py", line 69, in <listcomp>
    [
  File "/home/pepper/.local/lib/python3.8/site-packages/rasa/core/featurizers/tracker_featurizers.py", line 70, in <listcomp>
    self.state_featurizer.encode_state(state, interpreter)
  File "/home/pepper/.local/lib/python3.8/site-packages/rasa/core/featurizers/single_state_featurizer.py", line 201, in encode_state
    self._extract_state_features(sub_state, interpreter, sparse=True)
  File "/home/pepper/.local/lib/python3.8/site-packages/rasa/core/featurizers/single_state_featurizer.py", line 169, in _extract_state_features
    parsed_message = interpreter.featurize_message(message)
  File "/home/pepper/.local/lib/python3.8/site-packages/rasa/core/interpreter.py", line 158, in featurize_message
    result = self.interpreter.featurize_message(message)
  File "/home/pepper/.local/lib/python3.8/site-packages/rasa/nlu/model.py", line 418, in featurize_message
    component.process(message, **self.context)
  File "/home/pepper/.local/lib/python3.8/site-packages/rasa/nlu/featurizers/sparse_featurizer/count_vectors_featurizer.py", line 561, in process
    sequence_features, sentence_features = self._create_features(
  File "/home/pepper/.local/lib/python3.8/site-packages/rasa/nlu/featurizers/sparse_featurizer/count_vectors_featurizer.py", line 438, in _create_features
    seq_vec = self.vectorizers[attribute].transform(tokens)
  File "/home/pepper/.local/lib/python3.8/site-packages/sklearn/feature_extraction/text.py", line 1247, in transform
    self._check_vocabulary()
  File "/home/pepper/.local/lib/python3.8/site-packages/sklearn/feature_extraction/text.py", line 467, in _check_vocabulary
    raise NotFittedError("Vocabulary not fitted or provided")
sklearn.exceptions.NotFittedError: Vocabulary not fitted or provided
[pepper@deepy airobots]$

The error happens only when training core. Training nlu is ok.

Hi @petasis. Rasa Open Source 2.0 was released yesterday. A lot of bug fixes went in between the fourth release candidate and the release. Would you be willing to try if you are seeing the same error after upgrading?