A way to compare different NLU Pipelines with new test data

Hi all,

I am trying to find a way to evaluate two different NLU Pipelines with new test data. Basically, rasa provides us with a cool evaluation tool described in comparing-nlu-pipelines section (Testing Your Assistant), but it is solely based on a nlu.yaml file split into training (80%) and test (20%). I want to use my nlu.yaml 100% for training data since having the prepared test data.

Seems I can utilize some arguments for what I want (Command Line Interface), but simply I am not sure about what to use for the new test data as well as pipelines comparison.

Maybe, like this?

“rasa test nlu --nlu data/nlu.yml data/new_test_data.yml –config config_1.yml config_2.yml’”

Please let me know if there is a way for this. Thanks.

@miner Well, normally that is not the practice in training model convention and machine learning models to take 100% for training . But, if you required you can use your nlu file and write the python code in juypter notebook and train the model. Hope you know the convention.

@miner Yes, please try to do the experiment as you shown in the above command.

2 Likes

Yes, the command is correct.

But as Nik pointed out, the standard is to usually take 80% of your data for training and 20% for testing. If your bot’s behaviour drastically changes because of 20% less data, it means you need more and/or better data in the first place.


I would also recommend setting a random_seed: If you want to accurately compare two Pipeline Components or Policies across multiple trainings, you could set a Seed for DIET, ResponseSelector, and TED like so for example:

- name: DIETClassifier
  random_seed: 1
  // other parameters

I also suggest you use Tensorboard to make comparisons and choose an optimal configuration. This is also doable on DIET, ResponseSelector, and TED like so for example:

- name: DIETClassifier
  // other parameters
  evaluate_on_number_of_examples: 200
  evaluate_every_number_of_epochs: 5
  tensorboard_log_directory: ./tensorboard/DIET
  tensorboard_log_level: epoch

Try to set evaluate_on_number_of_examples to about 20% of your total number of examples (of course, this means these examples will not be used for training and you will have to give a bit more examples). You can use this script I wrote to count the number of examples you have.

1 Like

Thank you so much, all!

1 Like

@miner No worries we are happy to help you and try solved your issue at it best.

1 Like