what methods are suitable to preprocess and prepare real dialogues as training data for Rasa Core? Up to now, I just would look at it and find stories with the pure eye. But maybe there some techniques? Especially when you have thousands of chat logs?
Have there been updates in Rasa that allow for this feature?
We have transcripts from customer service calls that we would like to use as training data for our dialog and NLU. Apart from the fact that phone calls might have a different tone of voice and conversation style, would this be possible at all?
Thanks!
I’m not sure what you mean by end-to-end training. I guess one option would be to import the transcripts some how in Rasa X and annotate them. But ideally we would upload the transcripts and get new stories based on the transcripts. Is this something you are working on?
Hi @martinevs. If your conversations are between a Rasa assistant and users, you can import them into Rasa X and annotate them there.
However, it sounds like your transcripts are between humans. In that case, you cannot import them into Rasa X because it is built for conversations between a Rasa assistant and users.
There are still many open research questions in the field of Natural Language Processing (NLP) around learning from not only unlabelled but also labelled conversations, so we have a long way to go towards being able to build an assistant from transcripts from humans.
Transcripts are really helpful for scoping out user goals and the capabilities of an assistant. Plus, you can use customer messages for NLU training data. However, at this point, it is still a very manual process and will require you to review those transcripts in order to define your domain and annotate training data.