I am trying to find a way to evaluate two different NLU Pipelines with new test data. Basically, rasa provides us with a cool evaluation tool described in comparing-nlu-pipelines section (Testing Your Assistant), but it is solely based on a nlu.yaml file split into training (80%) and test (20%). I want to use my nlu.yaml 100% for training data since having the prepared test data.
Seems I can utilize some arguments for what I want (Command Line Interface), but simply I am not sure about what to use for the new test data as well as pipelines comparison.
Maybe, like this?
“rasa test nlu --nlu data/nlu.yml data/new_test_data.yml
–config config_1.yml config_2.yml’”
Please let me know if there is a way for this.
Thanks.
@miner Well, normally that is not the practice in training model convention and machine learning models to take 100% for training . But, if you required you can use your nlu file and write the python code in juypter notebook and train the model. Hope you know the convention.
@miner Yes, please try to do the experiment as you shown in the above command.
But as Nik pointed out, the standard is to usually take 80% of your data for training and 20% for testing.
If your bot’s behaviour drastically changes because of 20% less data, it means you need more and/or better data in the first place.
I would also recommend setting a random_seed: If you want to accurately compare two Pipeline Components or Policies across multiple trainings, you could set a Seed for DIET, ResponseSelector, and TED like so for example:
- name: DIETClassifier
random_seed: 1
// other parameters
I also suggest you use Tensorboard to make comparisons and choose an optimal configuration. This is also doable on DIET, ResponseSelector, and TED like so for example:
Try to set evaluate_on_number_of_examples to about 20% of your total number of examples (of course, this means these examples will not be used for training and you will have to give a bit more examples). You can use this script I wrote to count the number of examples you have.