I am highly interested in the fairly recent blogpost about Semantic Map Embedding. This really sounds like an idea I would like to try myself. So, I have been wondering about the implementation of the embedding which unfortunately seems to not be accessible freely for using with an entirely different dataset right? I can only use the SemanticMapFeaturizer with an already embedded wikipedia corpus but I cannot train my own embedding. Please correct me if I am mistake.
Hello @BellaBoga. Welcome to the forum and thank you for your interest!
You’re right, I didn’t publish the code to generate the map. This would require a bit of cleanup and documentation on my side. I’ll do this eventually, but it’s not a priority right now. Who else would be interested in creating their own semantic maps?
Sorry for only getting back to you now! There is still some issues with running the code. To my understanding there are some files missing that are needed for the compilation process. When I try to compile the executable there is already something missing.
So basically, I can build my own semantic map like with the example corpus right? Or differently ask: what kind of data is taken as input? And what is the output? How can this be analysed?
@BellaBoga Inputs are plain text files like these, as well as a vocabulary list. The output is the JSON file of the embedding that you can use with the Rasa’s SemanticMapFeaturizer on the rasa-nlu-examples repo.
To create text files of this format from Wikipedia dumps, you may use the smap branch of the forked version of the WikiExtractor on the Rasa GitHub page.