Exception: Failed to validate component CountVectorsFeaturizer. Missing property: 'tokens'

minnie · February 17, 2020, 9:00am

I am trying to test Formbot example in rasa. I edited the pipeline which includes custom components like the following.

pipeline:
- name: custom_components.parrot_extractor.ParrotExtractor
- name: EntitySynonymMapper
- name: CountVectorsFeaturizer
  analyzer: char
  min_ngram: 1
  max_ngram: 8
- name: EmbeddingIntentClassifier
  loss_type: margin
  batch_size: [2, 32]
  epochs: 125
  embed_dim: 30
- name: DucklingHTTPExtractor
  url: http://localhost:8000
  dimensions:
  - number
policies:
- name: FallbackPolicy
- name: MemoizationPolicy
- name: FormPolicy
- name: MappingPolicy

and I got the error Exception: Failed to validate component CountVectorsFeaturizer. Missing property: ‘tokens’. As the analyzer for Count Vector Featurizer is set to ‘char’, I think it is no need to have tokens for it, right? The older version of rasa had no such error and all worked fine. Please answer me, thank you in advance.

stephens · February 18, 2020, 2:59am

Can you re-paste your config.yml bracketed by three tick marks - ```

This will format it correctly. I don’t see any issue based on what’s shown above.

minnie · February 18, 2020, 3:26am

I edited, sir. Please help me.

stephens · February 18, 2020, 11:04pm

The problem is that you don’t have a Tokenizer in your pipeline. The CountVectorsFeaturizer require tokens.

Try inserting the WhitespaceTokenizer at the beginning of your pipeline:

pipeline:
- name: "WhitespaceTokenizer"
- name: custom_components.parrot_extractor.ParrotExtractor
- name: EntitySynonymMapper
...

minnie · February 19, 2020, 3:34am

Thank you for your reply sir. But in rasa old version, it works even if there is no tokenizer. I set the analyzer of CountVectorFeaturizer to ‘char’ and also set minimum and maximum ngram range. But why does it still need to have tokens?

smar10 · February 24, 2020, 6:23pm

Same problem here! I updated the rasa version for a project that is already in production, and with the new version I get an error on the pipeline - Failed to validate component CountVectorsFeaturizer. Missing property: ‘tokens’.

Also, if you check the documentation, the CountVectorsFeaturizer not requires any tokenizer.

Can you check @stephens ?

Many thanks

stephens · February 27, 2020, 1:49pm

Upon further research, there was a change in CountVectorsFeaturizer. It used to do a text.split() if tokens were not present. That was removed in a recent release and you now must explicitly include a tokenizer.

smar10 · February 29, 2020, 7:32pm

@stephens many thanks for the explanation!

Topic		Replies	Views
Failed to validate at component 'CountVectorsFeaturizer'. Missing property: 'response_tokens' Rasa Open Source	1	378	March 13, 2020
WhitespaceTokenizer ignored from pipeline Rasa Open Source	0	357	April 17, 2022
Custom Component Rasa version 3 Rasa Open Source	0	908	February 16, 2022
SklearnIntentClassifier rasa 3.0 Rasa Open Source	4	706	December 2, 2024
How to config token_pattern for CountVectorsFeaturizer in config.yml? Rasa Open Source	0	1091	July 22, 2019

Exception: Failed to validate component CountVectorsFeaturizer. Missing property: 'tokens'

Related topics