Semantic Map Embedding

Hello everyone,

I am highly interested in the fairly recent blogpost about Semantic Map Embedding. This really sounds like an idea I would like to try myself. So, I have been wondering about the implementation of the embedding which unfortunately seems to not be accessible freely for using with an entirely different dataset right? I can only use the SemanticMapFeaturizer with an already embedded wikipedia corpus but I cannot train my own embedding. Please correct me if I am mistake.

Looking forward to further comments on this. :slight_smile:

Best,

Bella

2 Likes

Hello @BellaBoga. Welcome to the forum and thank you for your interest!

You’re right, I didn’t publish the code to generate the map. This would require a bit of cleanup and documentation on my side. I’ll do this eventually, but it’s not a priority right now. Who else would be interested in creating their own semantic maps?

4 Likes

Ok, I just did it now. You can find the code you’d need to generate your own semantic map embedding here: https://github.com/RasaHQ/semantic-map-embedding

@BellaBoga Let me know about your experiences with this embedding!

2 Likes

Hi j.mosig,

Great thank you! I will try it out and let you know about my experience. Thank you for sharing. :slight_smile: Best, Bella

2 Likes

For anyone else following this thread, here are links to the two Blog articles by @j.mosig related to this post:

3 Likes

Hi there,

Sorry for only getting back to you now! There is still some issues with running the code. To my understanding there are some files missing that are needed for the compilation process. When I try to compile the executable there is already something missing.

I’d be glad about any sort of feedback!

Hello @BellaBoga

What is the error message and at what step do you get it? Maybe you don’t have g++-10 installed in your system? I think I installed g++-10 like this.

1 Like

Thank you! Yes, that’s it. I could run the code!

So basically, I can build my own semantic map like with the example corpus right? Or differently ask: what kind of data is taken as input? And what is the output? How can this be analysed?

@BellaBoga Inputs are plain text files like these, as well as a vocabulary list. The output is the JSON file of the embedding that you can use with the Rasa’s SemanticMapFeaturizer on the rasa-nlu-examples repo.

To create text files of this format from Wikipedia dumps, you may use the smap branch of the forked version of the WikiExtractor on the Rasa GitHub page.