Two featurizer in rasa nlu config file

ksk · April 7, 2020, 9:17am

Rasa version - 1.9.3

I am experimenting with rasa-nlu with default configuration given on the docs. As it is given on this link - components of nlu - I am using this as config

language: “en”

pipeline:

name: ConveRTTokenizer
name: ConveRTFeaturizer
name: RegexFeaturizer
name: LexicalSyntacticFeaturizer
name: CountVectorsFeaturizer
name: CountVectorsFeaturizer analyzer: “char_wb” min_ngram: 1 max_ngram: 4
name: DIETClassifier epochs: 100
name: EntitySynonymMapper
name: ResponseSelector epochs: 100

I have a few questions

1 Why two featurizer “ConveRTFeaturizer” and “CountVectorsFeaturizer”.

2 Why two “CountVectorsFeaturizer” - it should be one. If not then why 2?

n2718281 · April 7, 2020, 9:36am

2 Why two “CountVectorsFeaturizer” - it should be one. If not then why 2?

There is one for the word level, and one for the char level

ksk · April 8, 2020, 7:32am

Can you point me to the documentation where it is mentioned?

Second, why use “CountVectorsFeaturizer” when “ConveRTFeaturizer” is already there.

ksk · April 8, 2020, 7:34am

Please have a look here https://rasa.com/docs/rasa/nlu/components/#countvectorsfeaturizer

ksk · April 8, 2020, 7:37am

@akelad

akelad · April 9, 2020, 11:59am

As @n2718281 says, 2 CountVectorsFeaturizer: one is on the word level, the other on the character level. As for the combination of ConveRTFeaturizer and CountVectorsFeaturizer: ConveRTFeaturizer uses pre-trained embeddings, so the model already has some information about the words. CountVectorsFeaturizer can additionally complement that if you have some very domain specific words. E.g. balance could mean very different things in finance vs general english

ksk · April 9, 2020, 6:56pm

@akelad thanks - I think this is not mentioned in the documentation or is it? Although i have gone through following link

https://rasa.com/docs/rasa/nlu/component/?&_ga=2.218898713.1621572733.1586150684-338641809.1571736918#countvectorsfeaturizer

lohith.arcot · September 6, 2020, 2:15pm

Yeah. The documentation is unclear. I too came here searching.

eashan_27 · October 12, 2020, 5:28am

Hey @akelad I have one question regarding RASA nlu model.Since when I run rasa shell nlu on the command line it returns a JSON output of all the intents,entities in the sentence passed as a query.So i wanted to ask is there a way we can get that JSON output while running in RASA X in local server? PLease answer it’ll be really helpful

Topic		Replies	Views
When doing "rasa init", why does the config.yml file have two "CountVectorFeaturizer"? Rasa Open Source	2	568	September 8, 2021
Can we use both word and character in word count featurizer in rasa Rasa Open Source	3	563	October 6, 2021
Why there are more than one featurizer in the nlu pipeline config? Rasa Open Source	1	307	September 5, 2021
Why CountVectorsFeaturizer is used twice in config.yml? Rasa Open Source	1	176	October 16, 2023
A question about twice CountVectorsFeaturizer entry in supervised_embedding pipeline recipe Rasa Open Source	1	1228	October 15, 2019

Two featurizer in rasa nlu config file

Related topics