I am trying to train an NLU model. It was working fine and then it suddenly started to throw errors when I tried to train more models. I have been trying out various pipeline options to improve the nlu model.
Currently, it fails at CountVectorFeaturizer component. The error I am getting is:
AttributeError: 'CountVectorizer' object has no attribute 'vocabulary_'
I feel that the reason for this error is that the featurizer is not getting tokens? The regexFeaturizer and LexicalFeaturizer seem to be working fine as I can see the “FInished training Component” log for both of them.
I also noticed that once I call “rasa train nlu” , I don’t get a log message saying “training component: Whitespace Tokenizer”. So, I am trying to figure out if that’s why the CountVector Featurizer is failing? any pointers on what I could try?
Here’s the config pipeline file that I am using:
language: "en" # your two-letter language code
pipeline:
- name: WhitespaceTokenizer
intent_tokenization_flag: True
intent_split_symbol: "+"
- name: RegexFeaturizer
case_sensitive: False
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
analyzer: "words"
min_ngram: 1
max_ngram: 3
- name: DIETClassifier
epochs: 50
number_of_transformer_layers: 2
intent_tokenization_flag: true
intent_split_symbol: "+"
- name: EntitySynonymMapper
- name: RegexEntityExtractor
# text will be processed with case insensitive as default
case_sensitive: False
# use lookup tables to extract entities
use_lookup_tables: True
# use regexes to extract entities
use_regexes: False
# use match word boundaries for lookup table
"use_word_boundaries": True
RASA Details:
rasa version: 3.1.0
Operating system: Ubuntu 20.04
Python Version: 3.8.10