CountVectorizer warnings

KarthiAru · November 9, 2019, 5:29am

Hi,

I’m using rasa 1.4.0 and I’m getting a few CountVectorizer warnings. Can you help me understand these issues? What is the impact on the model due to this?

2019-11-09 10:38:59 WARNING  rasa.nlu.featurizers.count_vectors_featurizer  - Unable to train CountVectorizer for message attribute text. Leaving an untrained CountVectorizer for it
2019-11-09 10:38:59 WARNING  rasa.nlu.featurizers.count_vectors_featurizer  - Unable to train CountVectorizer for message attribute intent. Leaving an untrained CountVectorizer for it
2019-11-09 10:38:59 DEBUG    rasa.nlu.featurizers.count_vectors_featurizer  - No text provided for response attribute in any messages of training data. Skipping training a CountVectorizer for it.

This is my config.yml

language: en
pipeline:
- name: WhitespaceTokenizer
- name: CRFEntityExtractor
- name: EntitySynonymMapper
- name: CountVectorsFeaturizer
  stop_words: {'english'}
  analyzer: word
  token_pattern: r'(?u)\b\w\w+\b'
  lowercase: true
  max_ngram: 5
  min_ngram: 1
  OOV_token: '__oov__'
  OOV_words: ['Singapore', 'Australia', '2019', '2020']
- name: CountVectorsFeaturizer
  analyzer: char_wb
  lowercase: true
  max_ngram: 5
  min_ngram: 3
- name: EmbeddingIntentClassifier
  random_seed: 12345
  intent_split_symbol: +
  intent_tokenization_flag: true
- name: DucklingHTTPExtractor
  url: http://localhost:8000
  dimensions:
  - time
  - number
  - amount-of-money
  - distance
  locale: en_GB
  timezone: US/Eastern
  timeout: 20

policies:
- name: KerasPolicy
  rnn_size: 32
  epochs: 150
  batch_size: 32
  validation_split: 0.1
  max_history: 10
  random_seed: 12345
- name: FormPolicy
- name: AugmentedMemoizationPolicy
  max_history: 6
- name: MappingPolicy
- name: TwoStageFallbackPolicy
  core_threshold: 0.3 
  nlu_threshold: 0.9
  ambiguity_threshold: 0.1
  fallback_core_action_name: action_default_fallback
  fallback_nlu_action_name: action_default_ask_affirmation
  deny_suggestion_intent_name: out_of_scope

erohmensing · November 11, 2019, 3:18pm

Hi there, what does your NLU data look like?

erohmensing · February 24, 2020, 8:11pm

@pranavinu I think we fixed this issue. What version are you running, and what does your configuration look like?

pranavinu · February 26, 2020, 10:18am

hi @erohmensing, I am using 1.6.2 version. My config file looks like:

language: en

pipeline:

name: SpacyNLP
name: WhitespaceTokenizer
name: CRFEntityExtractor
name: EntitySynonymMapper
name: CountVectorsFeaturizer
name: EmbeddingIntentClassifier batch_strategy: sequence

policies:

name: FallbackPolicy nlu_threshold: 0.3 core_threshold: 0.4 fallback_action_name: “action_default_fallback”
name: FormPolicy
name: MappingPolicy
name: KerasPolicy epochs: 300

erohmensing · March 2, 2020, 5:39pm

Oh okay fair. that isn’t a warning message (it would be yellow if it was), just a debug message. that component could train a countvectorizer for responses, but there isn’t any data, so it will skip it. its nothing to worry about

pranavinu · April 3, 2020, 7:44am

okay @erohmensing. Thanks

Topic		Replies	Views
Unable to train CountVectorizer for message attribute response Rasa Open Source	2	1163	September 16, 2019
WhitespaceTokenizer ignored from pipeline Rasa Open Source	0	314	April 17, 2022
A question about twice CountVectorsFeaturizer entry in supervised_embedding pipeline recipe Rasa Open Source	1	1154	October 15, 2019
Getting Warning in diet_classifier.py Rasa Open Source	7	1582	September 29, 2020
UserWarning: Number of dense features (30) for attribute 'TEXT' does not match number of tokens (33) Rasa Open Source	1	256	May 14, 2021

CountVectorizer warnings

Related Topics