Bootstrapping domain training

beuller · July 1, 2020, 5:08am

I’m interested to learn of any public datasets which we can use to bootstrap domain models. We’re using RASA to power a voice first conversational model.

Has anyone any experience taking call recordings to distill this into NLU training? Are there any datasets already available I can use?

Thanks.

koaning · July 1, 2020, 7:25am

We’ve got some demos that are open source. These bots also contain datasets that you can use for general benchmarking.

But these all assume conversational situations over text, not voice.

koaning · July 1, 2020, 7:27am

It may also be worthwhile to point out that it’s tricky to benchmark your approach using somebody else’s data.

In the end the stories/conversations that you optimise for should be the stories/conversations that your users generate. If the overlap between these two datasets is not big, you may be at risk of optimising something that won’t help your end-users.

Topic		Replies	Views
Rasa-nlu-benchmark: Collection of dataset and corresponding benchmark for Rasa NLU Rasa Open Source	5	2757	April 2, 2020
Data Rasa Open Source	6	553	November 1, 2018
NLU Training Data Source Suggestions Rasa Open Source	1	328	December 18, 2021
Large Dataset Rasa Open Source	1	507	December 1, 2020
Request for conversational dataset Rasa Open Source	1	567	April 4, 2019

Bootstrapping domain training

Related topics