[New Feature] LLMs for Machine Translation of slot-annotated data

kuba2111 · April 16, 2024, 12:05pm

Expansion of SLU to new languages requires much work on manual annotation of data. In order to significantly reduce amount of work, LLMs can be used to machine translate english slot-annotated data, e.g.

"play me <a> Dune <a> on <b> Youtube <b>" => "Spiele mir <a> Dune <a> auf <b> Youtube <b>"

In our recent work, we fine-tuned MT-LLM called BigTranslate towards MT of slot-annotated NLU data. We used parallel Amazon MASSIVE dataset for fine-tuning. There is significant performance improvement after fine-tuning (compared to zero-shot LLM-based machine translation + compared to zero-shot mBERT + compared to other state-of-the-art approaches like FC-MTLF) on multiATIS++ benchmark.

Here you can find fine-tuned BigTranslate: Samsung/BigTranslateSlotTranslator · Hugging Face Here you can find code for fine-tuning + code for NLU training: GitHub - Samsung/MT-LLM-NLU: Repository for code related to "LLM-Based Machine Translation for Expansion of Spoken Language Understanding Systems to New Languages" publication.

In short, we want to somehow merge our pipeline into RASA, but I don’t know where to even start, as RASA doesn’t have MT pipelines as for now. I made jira issue for that [OSS-765] - Jira

stephens · April 20, 2024, 4:11am

Yo could write a custom component to do this.

Topic		Replies	Views
Support for Language Models inside Rasa Release Announcements community , rasa	25	12145	November 25, 2021
NLU customization for Arabic language Rasa Open Source	8	1375	May 2, 2023
Feedback on ConveRT Model + Rasa NLU Rasa Open Source	59	6378	May 20, 2020
Rasa Chatbot Arabic Language Support Rasa Open Source	10	4695	January 14, 2022
Help with Rasa for Hebrew Rasa Open Source	12	1013	January 19, 2021

[New Feature] LLMs for Machine Translation of slot-annotated data

Related Topics