A way to compare different NLU Pipelines with new test data

miner · September 10, 2021, 10:28am

Hi all,

I am trying to find a way to evaluate two different NLU Pipelines with new test data. Basically, rasa provides us with a cool evaluation tool described in comparing-nlu-pipelines section (Testing Your Assistant), but it is solely based on a nlu.yaml file split into training (80%) and test (20%). I want to use my nlu.yaml 100% for training data since having the prepared test data.

Seems I can utilize some arguments for what I want (Command Line Interface), but simply I am not sure about what to use for the new test data as well as pipelines comparison.

Maybe, like this?

“rasa test nlu --nlu data/nlu.yml data/new_test_data.yml –config config_1.yml config_2.yml’”

Please let me know if there is a way for this. Thanks.

nik202 · September 10, 2021, 10:41am

@miner Well, normally that is not the practice in training model convention and machine learning models to take 100% for training . But, if you required you can use your nlu file and write the python code in juypter notebook and train the model. Hope you know the convention.

@miner Yes, please try to do the experiment as you shown in the above command.

ChrisRahme · September 10, 2021, 11:29am

Yes, the command is correct.

But as Nik pointed out, the standard is to usually take 80% of your data for training and 20% for testing. If your bot’s behaviour drastically changes because of 20% less data, it means you need more and/or better data in the first place.

I would also recommend setting a random_seed: If you want to accurately compare two Pipeline Components or Policies across multiple trainings, you could set a Seed for DIET, ResponseSelector, and TED like so for example:

- name: DIETClassifier
  random_seed: 1
  // other parameters

I also suggest you use Tensorboard to make comparisons and choose an optimal configuration. This is also doable on DIET, ResponseSelector, and TED like so for example:

- name: DIETClassifier
  // other parameters
  evaluate_on_number_of_examples: 200
  evaluate_every_number_of_epochs: 5
  tensorboard_log_directory: ./tensorboard/DIET
  tensorboard_log_level: epoch

Try to set evaluate_on_number_of_examples to about 20% of your total number of examples (of course, this means these examples will not be used for training and you will have to give a bit more examples). You can use this script I wrote to count the number of examples you have.

miner · September 10, 2021, 11:33am

Thank you so much, all!

nik202 · September 10, 2021, 12:21pm

@miner No worries we are happy to help you and try solved your issue at it best.

Topic		Replies	Views
Comparing NLU Pipelines with data augmentation Rasa Open Source	2	487	November 3, 2021
Comparison between 2 models on RASA 2.0 Rasa Open Source	2	1171	October 27, 2020
Error while testing the assistant by comparing NLU Pipelines Rasa Open Source	4	371	September 29, 2020
Evaluate NLU models with different pretrained models Rasa Open Source	2	1441	March 24, 2020
Comparing pipeline Performance Getting Started with Rasa	2	162	January 20, 2021

A way to compare different NLU Pipelines with new test data

Related topics