I have asked a question in the comment section of your CBOW and Skip Gram YT tutorial about implementing similar solutions in RASA, and I believe Vincent replied to me suggesting to ask the question here on RASA Forums, so here goes.
So, I have a general-purpose language model in spaCy (for Hungarian Language) that has dense word-embedding vectors in it. I would like to use these word2vec word-embeddings (token.vector) in my RASA model for better accuracy, but I haven’t found too much info about how one might do that. I haven’t checked the code in great detail yet, however I am not sure that this is even possible without writing custom code into the RASA pipeline.
Question: If possible to use word2vec word-embeddings in the RASA Open Source, how can I do that?
There are multi-language BERT embeddings that are supported via our LanguageModelFeaturizer. In particular, we got good feedback on LaBSE. I believe Hungarian is also supported here.
Having mentioned these tools though, I would like to stress that the most important part of an assistant is the data, not the pipeline. I might focus on getting something demo-able first so that you can start collecting feedback from users. I do not speak Hungarian, but I can imagine that DIET and some simple CountVectors will go a long way when you’re starting out.
If there are more specific issues that you’re concerned with, let me know! I’m working on educational material this quarter on Non-English and if there are any specific blockers for you I’d love to understand that some more. I’m also interested in hearing if there are tools that I should add to Rasa-NLU-Examples for Hungarian.