I’m interested to learn of any public datasets which we can use to bootstrap domain models. We’re using RASA to power a voice first conversational model.
Has anyone any experience taking call recordings to distill this into NLU training? Are there any datasets already available I can use?
It may also be worthwhile to point out that it’s tricky to benchmark your approach using somebody else’s data.
In the end the stories/conversations that you optimise for should be the stories/conversations that your users generate. If the overlap between these two datasets is not big, you may be at risk of optimising something that won’t help your end-users.