Exception: Failed to validate component CountVectorsFeaturizer. Missing property: 'tokens'

I am trying to test Formbot example in rasa. I edited the pipeline which includes custom components like the following.

pipeline:
- name: custom_components.parrot_extractor.ParrotExtractor
- name: EntitySynonymMapper
- name: CountVectorsFeaturizer
  analyzer: char
  min_ngram: 1
  max_ngram: 8
- name: EmbeddingIntentClassifier
  loss_type: margin
  batch_size: [2, 32]
  epochs: 125
  embed_dim: 30
- name: DucklingHTTPExtractor
  url: http://localhost:8000
  dimensions:
  - number
policies:
- name: FallbackPolicy
- name: MemoizationPolicy
- name: FormPolicy
- name: MappingPolicy

and I got the error Exception: Failed to validate component CountVectorsFeaturizer. Missing property: ‘tokens’. As the analyzer for Count Vector Featurizer is set to ‘char’, I think it is no need to have tokens for it, right? The older version of rasa had no such error and all worked fine. Please answer me, thank you in advance.

Can you re-paste your config.yml bracketed by three tick marks - ```

This will format it correctly. I don’t see any issue based on what’s shown above.

I edited, sir. Please help me.

The problem is that you don’t have a Tokenizer in your pipeline. The CountVectorsFeaturizer require tokens.

Try inserting the WhitespaceTokenizer at the beginning of your pipeline:

pipeline:
- name: "WhitespaceTokenizer"
- name: custom_components.parrot_extractor.ParrotExtractor
- name: EntitySynonymMapper
...

Thank you for your reply sir. But in rasa old version, it works even if there is no tokenizer. I set the analyzer of CountVectorFeaturizer to ‘char’ and also set minimum and maximum ngram range. But why does it still need to have tokens?

Same problem here! I updated the rasa version for a project that is already in production, and with the new version I get an error on the pipeline - Failed to validate component CountVectorsFeaturizer. Missing property: ‘tokens’.

Also, if you check the documentation, the CountVectorsFeaturizer not requires any tokenizer.

Can you check @stephens ?

Many thanks

Upon further research, there was a change in CountVectorsFeaturizer. It used to do a text.split() if tokens were not present. That was removed in a recent release and you now must explicitly include a tokenizer.

@stephens many thanks for the explanation!