Managing Intent datasets

I had been struggling between all the test bots that I made so far to clean my workflow, and one of the big problems that I had was that I always had to re-organize my intents from scratch whenever I would write a new bot. I found a way that made me able to recycle intent datasets for things such as “affirm”, “deny”, “greeting”, “farewell”, “out_of_scope” pretty neatly, as well as organizing my workflow.

I don’t know if it makes sense for other people to adopt a similar workflow (perhaps if it’s of interest it could go in the NLU library). I attached my code below.

Essentially, you run the script in your bot’s folder by passing three arguments : the path to where your intent datasets are located (-n), your domain’s path (-d) and if you wanna do a train-test-split (-t), you indicate the float (such as 0.9 for 90% train 10% test).

What the script does is that it goes through your domain and picks up the list of intents. It uses those to go in your intent datasets are and let’s say for an example that your intents are

intents:

  • affirm
  • deny
  • out_of_scope

then it goes in the intent datasets folder and looks for “affirm.md”, “deny.md” and “out_of_scope.md”. It merges all of those in one dataset, and performs the train-test-split on each intent. You are left with a train_dataset.md and test_dataset.md split with the proper ratio.

This is currently very painful to do without this script, because you need to write all the examples for an intent by hand, and then by hand split for a test set (or write a script of your own…) and if you wanna make a new bot and re-use that intent, you have to go and crop the stuff out of your files.

Any comment, feedback, question or suggestion is appreciated!

How do you manage your intent examples datasets?

write_nlu_data.py (3.5 KB)

1 Like