Comparing NLU Pipelines with data augmentation

Hello,

I am writing to ask if anyone can help me, I want to test different NLU pipelines, using different training/test sets already defined, instead of passing a random division with different percentages: rasa test nlu --nlu data / nlu.yml --config config_1.yml config_2.yml --runs 4 --percentages 0 25 50 70 90

What happens is that I have divided my original data into 5 folds and created for each fold augmented data and concatenated it to the original data. I do not want to mix the data to test the different pipelines, since I want the splits to be independent and for this reason, I prefer to pass already established files.

If anyone can help me with this I would appreciate it!

A while ago I wrote a data augmentation tool for Rasa which also provides a benchmarking guide. The use-case differs slightly from yours, but the guide may still be useful.

1 Like

I will take a look at it!!! Thank you!