RASA Word Embeddings Confusion

varton · October 18, 2019, 7:28pm

Hi all,

First I want to thank RASA for explaining their underlying model. I am using the Tensorflow’s pipeline (i.e., supervised_embeddings) in my bot implementation. I went through the docs to understand how the algorithm works with an aim to improve my bot accuracy. I read the following article:

However, it was confusing for me the embeddings and classification of the intent. In particular, it was not clear for me: 1- What does it mean the intent label count vector? 2- If the user’s query will be embedded using the training set utterances features?

Finally, I read the “StarSpace: Embed All The Things!” paper, but I did not find answers to my questions.

Thanks in advance

dakshvar22 · October 22, 2019, 11:33am

@varton Not sure if I understand your questions correctly, but I’ll give you an answer as how I interpret them.

Count Vector is a Bag of Words featurization approach where the vector contains the number of times each word in the vocabulary is present in the text. Since here the text is intent label, the count vector will the corresponding vector for intent label
If you mean the user’s query at inference time, then no, user’s query will have it’s own computed vector. Only the vocabulary that was used during training will be shared.

Let me know if you have anymore questions

varton · October 22, 2019, 1:42pm

Thanks @dakshvar22 for your reply.

Regarding the first point, for example, if I have “Weather” intent that include utterances as follows:

What is the weather?
Show me the weather
Display the weather forecast

Then, the count vector of the “Weather” intent will be the number of times the utterances words appear in the vocabulary?

To ensure that I understand correctly, the vocabulary is constructed using the unique words in the entire training set?

For the second point, you have answered my question

Thanks

dakshvar22 · October 23, 2019, 6:50am

@varton count vectors for user utterances and intent labels can be built from a shared or an independent vocabulary. It’s configurable in CountVectorFeaturizer. If it’s independent, then for user utterances the vocabulary is constructed with unique words across all utterances in training set and for intent labels the vocabulary is constructed with unique words across all intent labels in the training set. In case, the vocabulary is shared, a common vocabulary is constructed with unique words across all user utterances and intent labels in training set.

varton · October 24, 2019, 4:35pm

Thanks @dakshvar22 for your quick reply.

I have one more final question

What is the purpose of creating a count vector for the intent label? Is it enough to create the count vector for the user utterances associated with that intent?

dakshvar22 · October 31, 2019, 10:36am

@varton The intent label can also have useful tokens which can assist in learning an embedding for the intent. Also, incase of multiple intents, count vectorizer is a good way to handle multiple tokens in the intent label.

varton · October 31, 2019, 3:16pm

Thanks @dakshvar22 for clarifying things. I really wish that there is a solid example that describe the models in details to benefit other Rasa’s users.

setopaisen · May 8, 2020, 1:53pm

hi, i’m trying to understand but still dont get it. Would you help me to correct ? i’m afraid my missunderstood gone deeper

shared vocabulary : pretrained embedding
independent vocabulary : no pretrained or word vector source used in pipeline
user utterance = user message
“vocabulary is constructed with unique words across all utterances in training set” = whole nlu.md file
"intent labels the vocabulary is constructed with unique words across all intent labels in the training set " = also whole nlu.md file ?

Thanks

Topic		Replies	Views
Bag-of-words understanding in supervised embedding pipeline Rasa Open Source	0	595	September 22, 2020
Understanding Rasa NLU Rasa Open Source	3	1206	December 19, 2020
SUPERVISED EMBEDDING Rasa Open Source	2	1026	May 15, 2019
What does Rasa use internally for text classification? Rasa Open Source	4	1426	July 14, 2019
Word Embedding in RASA NLU Rasa Open Source	4	1744	January 14, 2021

RASA Word Embeddings Confusion

Related topics