Managing Intent datasets

PatrickDS · January 26, 2019, 11:18pm

I had been struggling between all the test bots that I made so far to clean my workflow, and one of the big problems that I had was that I always had to re-organize my intents from scratch whenever I would write a new bot. I found a way that made me able to recycle intent datasets for things such as “affirm”, “deny”, “greeting”, “farewell”, “out_of_scope” pretty neatly, as well as organizing my workflow.

I don’t know if it makes sense for other people to adopt a similar workflow (perhaps if it’s of interest it could go in the NLU library). I attached my code below.

Essentially, you run the script in your bot’s folder by passing three arguments : the path to where your intent datasets are located (-n), your domain’s path (-d) and if you wanna do a train-test-split (-t), you indicate the float (such as 0.9 for 90% train 10% test).

What the script does is that it goes through your domain and picks up the list of intents. It uses those to go in your intent datasets are and let’s say for an example that your intents are

intents:

affirm
deny
out_of_scope

then it goes in the intent datasets folder and looks for “affirm.md”, “deny.md” and “out_of_scope.md”. It merges all of those in one dataset, and performs the train-test-split on each intent. You are left with a train_dataset.md and test_dataset.md split with the proper ratio.

This is currently very painful to do without this script, because you need to write all the examples for an intent by hand, and then by hand split for a test set (or write a script of your own…) and if you wanna make a new bot and re-use that intent, you have to go and crop the stuff out of your files.

Any comment, feedback, question or suggestion is appreciated!

How do you manage your intent examples datasets?

write_nlu_data.py (3.5 KB)

Topic		Replies	Views
Multiple datasets or single one for 25+ intents Rasa Open Source	1	435	October 9, 2018
Training data generation for FAQs answering bot Rasa Open Source	4	2995	January 11, 2021
Intent datasets Rasa Open Source	0	209	November 14, 2022
Database for NLU Rasa Open Source	3	1084	April 8, 2020
Create a new intent classification Rasa Open Source	3	517	February 12, 2020

Managing Intent datasets

Related topics