I am highly interested in the fairly recent blogpost about Semantic Map Embedding. This really sounds like an idea I would like to try myself. So, I have been wondering about the implementation of the embedding which unfortunately seems to not be accessible freely for using with an entirely different dataset right? I can only use the SemanticMapFeaturizer with an already embedded wikipedia corpus but I cannot train my own embedding. Please correct me if I am mistake.
Looking forward to further comments on this.
Hello @BellaBoga. Welcome to the forum and thank you for your interest!
You’re right, I didn’t publish the code to generate the map. This would require a bit of cleanup and documentation on my side. I’ll do this eventually, but it’s not a priority right now. Who else would be interested in creating their own semantic maps?
Ok, I just did it now. You can find the code you’d need to generate your own semantic map embedding here: https://github.com/RasaHQ/semantic-map-embedding
@BellaBoga Let me know about your experiences with this embedding!
Great thank you! I will try it out and let you know about my experience. Thank you for sharing.
For anyone else following this thread, here are links to the two Blog articles by @j.mosig related to this post:
Sorry for only getting back to you now! There is still some issues with running the code. To my understanding there are some files missing that are needed for the compilation process. When I try to compile the executable there is already something missing.
I’d be glad about any sort of feedback!
What is the error message and at what step do you get it? Maybe you don’t have
g++-10 installed in your system? I think I installed
g++-10 like this.
Thank you! Yes, that’s it. I could run the code!
So basically, I can build my own semantic map like with the example corpus right? Or differently ask: what kind of data is taken as input? And what is the output? How can this be analysed?
@BellaBoga Inputs are plain text files like these, as well as a vocabulary list. The output is the JSON file of the embedding that you can use with the Rasa’s
SemanticMapFeaturizer on the rasa-nlu-examples repo.
To create text files of this format from Wikipedia dumps, you may use the
smap branch of the forked version of the WikiExtractor on the Rasa GitHub page.