No need generating more training data because NLU with pre-trained BERT embedding can understand sentence and similar sentences already.
Pre-trained BERT is good at understanding similar word contexts but actually not that great at understanding similar sentences. Hence doing a sentence level data augmentation still makes sense.
When do your team plan to release this feature for Rasa?
Currently, this feature only supports for English, could you share how to fine-tune GPT-2 for this task?
For other language like Vietnamese, we don’t have kind of [ParaNMT-50M dataset]. So I think about create the same ParaNMT for Vietnamese using Google Translate from [ParaNMT-50M dataset] or back translation. How do you think about this idea?
Could you suggest for me some ways to do this job in Vietnamese or non-English?
Thanks @dakshvar22, this is very helpful. Just a tiny bug: since June 3rd when transformers 2.11.0 came out the model initialising throws an error, so maybe just change install code to transformers==2.10.0?
the download link in colab doesn’t work for me. I have the same problem as @indranil180 but even the new link you posted as a reply doesn’t work. is there any other way or link to get the model?
I am also facing the same issue as you @dakshvar22 could you please help us.
We are getting this error when running the model.
Archive: model.zip
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of model.zip or
model.zip.zip, and cannot find model.zip.ZIP, period.