Why there are more than one featurizer in the nlu pipeline config?

I used rasa open source to train nlu in farsi. the automatic config contains

  • name: RegexFeaturizer
  • name: LexicalSyntacticFeaturizer
  • name: CountVectorsFeaturizer
  • name: CountVectorsFeaturizer analyzer: char_wb min_ngram: 1 max_ngram: 4

why does it need to use more than one featurizer? and how they will be used?

By default (the first one), CVF works with words only. The second uses n-grams instead.

You can learn more about CVF in the Rasa Docs and in the Scikit-Learn Docs for more detail.

In the pipeline, every component will pass its output as the input of the next component, so the order matters.

1 Like