CountVectorizer warnings

Hi,

I’m using rasa 1.4.0 and I’m getting a few CountVectorizer warnings. Can you help me understand these issues? What is the impact on the model due to this?

2019-11-09 10:38:59 WARNING  rasa.nlu.featurizers.count_vectors_featurizer  - Unable to train CountVectorizer for message attribute text. Leaving an untrained CountVectorizer for it
2019-11-09 10:38:59 WARNING  rasa.nlu.featurizers.count_vectors_featurizer  - Unable to train CountVectorizer for message attribute intent. Leaving an untrained CountVectorizer for it
2019-11-09 10:38:59 DEBUG    rasa.nlu.featurizers.count_vectors_featurizer  - No text provided for response attribute in any messages of training data. Skipping training a CountVectorizer for it.

This is my config.yml

language: en
pipeline:
- name: WhitespaceTokenizer
- name: CRFEntityExtractor
- name: EntitySynonymMapper
- name: CountVectorsFeaturizer
  stop_words: {'english'}
  analyzer: word
  token_pattern: r'(?u)\b\w\w+\b'
  lowercase: true
  max_ngram: 5
  min_ngram: 1
  OOV_token: '__oov__'
  OOV_words: ['Singapore', 'Australia', '2019', '2020']
- name: CountVectorsFeaturizer
  analyzer: char_wb
  lowercase: true
  max_ngram: 5
  min_ngram: 3
- name: EmbeddingIntentClassifier
  random_seed: 12345
  intent_split_symbol: +
  intent_tokenization_flag: true
- name: DucklingHTTPExtractor
  url: http://localhost:8000
  dimensions:
  - time
  - number
  - amount-of-money
  - distance
  locale: en_GB
  timezone: US/Eastern
  timeout: 20

policies:
- name: KerasPolicy
  rnn_size: 32
  epochs: 150
  batch_size: 32
  validation_split: 0.1
  max_history: 10
  random_seed: 12345
- name: FormPolicy
- name: AugmentedMemoizationPolicy
  max_history: 6
- name: MappingPolicy
- name: TwoStageFallbackPolicy
  core_threshold: 0.3 
  nlu_threshold: 0.9
  ambiguity_threshold: 0.1
  fallback_core_action_name: action_default_fallback
  fallback_nlu_action_name: action_default_ask_affirmation
  deny_suggestion_intent_name: out_of_scope

Hi there, what does your NLU data look like?

@pranavinu I think we fixed this issue. What version are you running, and what does your configuration look like?

1 Like

hi @erohmensing, I am using 1.6.2 version. My config file looks like:

language: en

pipeline:

  • name: SpacyNLP
  • name: WhitespaceTokenizer
  • name: CRFEntityExtractor
  • name: EntitySynonymMapper
  • name: CountVectorsFeaturizer
  • name: EmbeddingIntentClassifier batch_strategy: sequence

policies:

  • name: FallbackPolicy nlu_threshold: 0.3 core_threshold: 0.4 fallback_action_name: “action_default_fallback”
  • name: FormPolicy
  • name: MappingPolicy
  • name: KerasPolicy epochs: 300

Oh okay fair. that isn’t a warning message (it would be yellow if it was), just a debug message. that component could train a countvectorizer for responses, but there isn’t any data, so it will skip it. its nothing to worry about

okay @erohmensing. Thanks :slight_smile: