Word Embedding in RASA NLU


I would like to know in which step word embeddings are created? Otherwise, Are word embeddings created by Text Featurizes ( CountVectorsFeaturizer for example) or by Intent Classifiers (EmbeddingIntentClassifiers for example) ? I know that CountVectorsFeaturizer transforms tokens into vectors, and EmbeddingIntentClassifiers is a ANN with 2 hidden layers and calculates the coefficients used for the text classification. But word embedding is a dense matrix, represents the similarity between the terms and (according to my knowledge) is used by the classifier. I hope you might be able to give me some insights on this.


1 Like

Hi Yasmine, there is no simple answer but I’ll try to give you some useful pointers.

With EmbeddingIntentClassifier, word embeddings are initialised and later trained as part of the classifier itself. It’s similar within our more recent classifier DIET (see this nice video on the architecture of DIET). However, one could argue that the embeddings are not true word embeddings: The classifiers accept inputs of all kinds from various featurisers (not one-hot encodings of words), and don’t train a true embedding matrix. Ultimately, the classifiers focus on training good sentence embeddings.

If you were to see some true word embeddings, it would be in the featurisers. For instance, the ConveRTFeaturizer and SpacyFeaturizer both use pre-trained embeddings. You can leverage other common embeddings such as FastText, see the nlu-examples repo.

Does this help? Feel free to ask more :slight_smile:

1 Like

Hi Sam, thank you for your reply!

So in Rasa, the Framework doesn’t really create an independent word embedding with its own parameters. Otherwise, it transforms tokens into sparce vectors by the Featurizers using a bag of words, then the output is used by the classifier to maximize the similarity between the sentences. Is that correct ? I have another question if you don’t mind, concerning the pretrained models, is there a HuggingFace model supported by Rasa and using for French data?


Hey Yasmine :slight_smile:

First, regarding your intuitions: You are right, though I should point out that there are also many dense featurisers which transform tokens and messages into dense vectors, using methods other than bag-of-words (though the very stable CountVectorsFeaturizer still produces sparse bag-of-words/characters/n-grams features which tend to perform well).

Regarding this bit:

could you be more specific? Do you mean NLU classifiers (perhaps specifically DIETClassifier)?

For French, you could use as a featuriser the multilingual version of Bert (or DistilBert), see the list of all available models. As the classifier, you would then most likely use DIET. By the way, I recommend using CountVectorsFeaturizer alongside any dense featuriser, usually it only helps (see also the recommended non-English pipeline for more details).

1 Like

Hey Sam,

I’m sorry for the late reply, I didn’t see the response. Yes, I meant NLU Classifiers which classify the sentence by maximizing the similarity with the correct intent and minimizing similarities with the incorrect intents. Thank you again.

1 Like