Donate your NLU training data!

Hey Rasa @community,

We have created a new repository that lives in RasaHQ/NLU-training-data with the goal of providing basic training data for developing chatbots.

We are currently testing this initiative, and we will need your help to build this open source dataset - which means it’s now open for contributions!

How do I donate my training data?
Within the Github read.me, you will find a guide on how to donate your data. The repository is sectioned into different categories of intent, and there is also a FAQ section to help you understand where to put your training data.

What about training data that’s not in English?
Right now, we are unable to evaluate the quality of all language contributions, and therefore, during the initial phase we can only accept English training data to the repository.
However, we understand that the Rasa community is a global one, and in the long-term we would like to find a solution for this in collaboration with the community.

Your feedback
We created this based on suggestions from the Rasa community and we’d love to improve it in a direction that would be beneficial for you and other developers, therefore, it would also be great to have your thoughts on the following:

  • Do you think that the organisation of the repository works well and is intuitive?
  • Do you feel this would be a valuable resource for the community?
6 Likes

Hello @Emma, Its just what I`m looking for.

I`ll contribute with some data, but how about other languages ? Maybe change file name or some folder structure.

Thanks, great initiative!

-Best

1 Like

As mentioned in the Readme only English is accepted at the moment.

Hey @davi,

That’s awesome to hear! :star_struck:

Exactly, as @markusgl kindly mentioned, first we would like to test it out in English so that we can evaluate the quality. If we are able to open this up to localised training data in future, we would adjust the repo structure retroactively to specify the language and make this much clearer. :slight_smile:

Despite all of this, it’s great to know that you’re interested in donating localised training data and letting us know really helps us to understand what the community is looking for.

You are right, sorry. Anyway just forked the repo, when ready for others languages i`ll make a PR.

Thanks!

1 Like

Just added a pull request with a BUNCH of new intents (54) for smalltalk and a handful of new intents (5) for mood, with some additional data to some of the out of the box intents in both of those categories.

Looking forward to seeing what others will contribute!

2 Likes

Hey @Emma

I have added different 86 intents for small talk. Please review it and if you find it useful do let me know.

3 Likes

Wow @abhishakskilrock - those are fantastic, well done

2 Likes

Hey @jonathanpwheat

Thanks for complement btw you also did a fantastic job by providing 54 different intents.

2 Likes

Thanks, I see a some overlap of intents, but you have all the context entities setup, whereas I just have basic data.

I’m glad this is an open source shared repo, because I’ll be implementing your nlu data into the small talk portion of my bot :grinning:

2 Likes

Sure @jonathanpwheat, after all this is the real purpose of open-source, where one person can also share the benefits of others contribution.

5 Likes

@jonathanpwheat & @abhishakskilrock,

Wow guys! Thank you so much for submitting all of this training data! :heart_eyes: we should be able to review your PRs before the end of this week.

It’s also great to see such a wholesome discussion going on here, we are very fortunate to have this supportive community. :blush:

1 Like

Hi, I have added my employment bot nlu data. This is my first contribution, please guide me if i have done any mistakes.

Thanks

Hey everyone,

I want to share a couple of updates to this repo:

1. Grab domain specific data with the Intent Example Finder

Research Advocate Vincent @koaning developed a special tool Intent Example Finder that provides an interface for easy collection of domain specific training data. Use the selector in the sidebar to construct NLU data as a starting point and use the clipboard icon to quickly copy the data!

Kapture 2021-02-17 at 12.07.23

2. Domain files are now YAML, and Rasa 2.x ready

You can also collect this data in YAML, instead of the previous Markdown format. You can find more information on the deprecation of Markdown and commands to convert Markdown to YAML, on our docs here.


Today, we have 1196 examples on this repo! :tada: Thank you to everyone supporting this crowdsourcing project and donating data! :purple_heart:

3 Likes

Hi, is it supported only for English or with multiple languages?

Hi Mohammed , Rasa supports multiple languages as well not just English

I mean the data hub, are intents present with different languages or just english @shazadmaved

@Emma I really think Rasa should invest more in multi-language support, this repo is an example of this. From the top 10 countries with more internet users, only 1 is English native. There is a huge market being neglected here:

internet-users