WhitespaceTokenizer ignored from pipeline

I am trying to train an NLU model. It was working fine and then it suddenly started to throw errors when I tried to train more models. I have been trying out various pipeline options to improve the nlu model.

Currently, it fails at CountVectorFeaturizer component. The error I am getting is:

AttributeError: 'CountVectorizer' object has no attribute 'vocabulary_'

I feel that the reason for this error is that the featurizer is not getting tokens? The regexFeaturizer and LexicalFeaturizer seem to be working fine as I can see the “FInished training Component” log for both of them.

I also noticed that once I call “rasa train nlu” , I don’t get a log message saying “training component: Whitespace Tokenizer”. So, I am trying to figure out if that’s why the CountVector Featurizer is failing? any pointers on what I could try?

Here’s the config pipeline file that I am using:

language: "en"  # your two-letter language code

pipeline:
  - name: WhitespaceTokenizer
    intent_tokenization_flag: True
    intent_split_symbol: "+"
  - name: RegexFeaturizer
    case_sensitive: False
  - name: LexicalSyntacticFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: "words"
    min_ngram: 1
    max_ngram: 3
  - name: DIETClassifier
    epochs: 50
    number_of_transformer_layers: 2
    intent_tokenization_flag: true
    intent_split_symbol: "+"
  - name: EntitySynonymMapper
  - name: RegexEntityExtractor
    # text will be processed with case insensitive as default
    case_sensitive: False
    # use lookup tables to extract entities
    use_lookup_tables: True
    # use regexes to extract entities
    use_regexes: False
    # use match word boundaries for lookup table
    "use_word_boundaries": True

RASA Details:

rasa version: 3.1.0
Operating system: Ubuntu 20.04
Python Version: 3.8.10