We have created a new repository that lives in RasaHQ/NLU-training-data with the goal of providing basic training data for developing chatbots.
We are currently testing this initiative, and we will need your help to build this open source dataset - which means it’s now open for contributions!
How do I donate my training data?
Within the Github read.me, you will find a guide on how to donate your data. The repository is sectioned into different categories of intent, and there is also a FAQ section to help you understand where to put your training data.
What about training data that’s not in English?
Right now, we are unable to evaluate the quality of all language contributions, and therefore, during the initial phase we can only accept English training data to the repository.
However, we understand that the Rasa community is a global one, and in the long-term we would like to find a solution for this in collaboration with the community.
Your feedback
We created this based on suggestions from the Rasa community and we’d love to improve it in a direction that would be beneficial for you and other developers, therefore, it would also be great to have your thoughts on the following:
Do you think that the organisation of the repository works well and is intuitive?
Do you feel this would be a valuable resource for the community?
Exactly, as @markusgl kindly mentioned, first we would like to test it out in English so that we can evaluate the quality. If we are able to open this up to localised training data in future, we would adjust the repo structure retroactively to specify the language and make this much clearer.
Despite all of this, it’s great to know that you’re interested in donating localised training data and letting us know really helps us to understand what the community is looking for.
Just added a pull request with a BUNCH of new intents (54) for smalltalk and a handful of new intents (5) for mood, with some additional data to some of the out of the box intents in both of those categories.
Looking forward to seeing what others will contribute!
1. Grab domain specific data with the Intent Example Finder
Research Advocate Vincent @koaning developed a special tool Intent Example Finder that provides an interface for easy collection of domain specific training data. Use the selector in the sidebar to construct NLU data as a starting point and use the clipboard icon to quickly copy the data!
@Emma I really think Rasa should invest more in multi-language support, this repo is an example of this. From the top 10 countries with more internet users, only 1 is English native. There is a huge market being neglected here: