Two featurizer in rasa nlu config file

Rasa version - 1.9.3

I am experimenting with rasa-nlu with default configuration given on the docs. As it is given on this link - components of nlu - I am using this as config

language: “en”

pipeline:

  • name: ConveRTTokenizer
  • name: ConveRTFeaturizer
  • name: RegexFeaturizer
  • name: LexicalSyntacticFeaturizer
  • name: CountVectorsFeaturizer
  • name: CountVectorsFeaturizer analyzer: “char_wb” min_ngram: 1 max_ngram: 4
  • name: DIETClassifier epochs: 100
  • name: EntitySynonymMapper
  • name: ResponseSelector epochs: 100

I have a few questions

1 Why two featurizer “ConveRTFeaturizer” and “CountVectorsFeaturizer”.

2 Why two “CountVectorsFeaturizer” - it should be one. If not then why 2?

2 Why two “CountVectorsFeaturizer” - it should be one. If not then why 2?

There is one for the word level, and one for the char level

Can you point me to the documentation where it is mentioned?

Second, why use “CountVectorsFeaturizer” when “ConveRTFeaturizer” is already there.

Please have a look here https://rasa.com/docs/rasa/nlu/components/#countvectorsfeaturizer

@akelad

As @n2718281 says, 2 CountVectorsFeaturizer: one is on the word level, the other on the character level. As for the combination of ConveRTFeaturizer and CountVectorsFeaturizer: ConveRTFeaturizer uses pre-trained embeddings, so the model already has some information about the words. CountVectorsFeaturizer can additionally complement that if you have some very domain specific words. E.g. balance could mean very different things in finance vs general english

@akelad thanks - I think this is not mentioned in the documentation or is it? Although i have gone through following link

https://rasa.com/docs/rasa/nlu/component/?&_ga=2.218898713.1621572733.1586150684-338641809.1571736918#countvectorsfeaturizer

1 Like

Yeah. The documentation is unclear. I too came here searching.

Hey @akelad I have one question regarding RASA nlu model.Since when I run rasa shell nlu on the command line it returns a JSON output of all the intents,entities in the sentence passed as a query.So i wanted to ask is there a way we can get that JSON output while running in RASA X in local server? PLease answer it’ll be really helpful