Hello community! I’m conducting a set of comparative experiments between ChatBot platforms and, for that, I need to see the representation of the word embedding generated by Rasa NLU for that training set, has anyone been through this and can help?
Hi there! This will depend on your configuration, can you share your config file?
Rasa internally has “embeddings” at different parts in the modelling pipeline. For example, you could have a pipeline that has countvectors (sparse features) as well as pre-trained word embeddings (dense features). That means that text, is turned into a set of ML features.
There are features per token but also a feature-set for the entire utterance. Since these featurized embeddings are general, you can use jupyter to visualise them. On behalf of Rasa I even maintain a library to make this easy.
However, if you use the DIET classifier in your pipeline then, technically, you could interpret the final layer of DIET as an embedding.
We don’t expose these yet, but there is a PR for this feature. If you’d like to understand this aspect some more, you may appreciate this video on the topic on our “Algorithm Whiteboard”-playlist on youtube.