I am wondering why tf-idf featurizer is not part of Rasa components. It is easily implementable and similar to countvector featurizer. But i dont understand why it is not provided. Is there any reason that this featurizer cant perform well or degrades the model?
There is just no need to provide the frequency, because the transformer itself learns to predict the intent depending on certain occurancies and combination of words. The embeddings are also passed through a feed-forward network first. Personally I usally don’t not use featurizers based on the occurance on whole words, they are in my opinion too sensible to misspellings and rather use subword or ngram-based methodes. In the end, you can give it a shot and try it out with a custom featurizer, maybee you will find something interesting.