Version of RASA
rasa 2.0.0rc2
We are trying to compare 2 different pipelines for the same training data.(Currently, we are using the default training data and test cases provided by RASA when doing rasa init).
The command used for testing each pipeline is
rasa test nlu --nlu data/nlu.yml --config config_hfp.yml --cross-validation
The final output in command line for each configuration run comes as below:
2020-10-20 13:11:56 INFO rasa.test - CV evaluation (n=5)
2020-10-20 13:11:56 INFO rasa.test - Intent evaluation results
2020-10-20 13:11:56 INFO rasa.nlu.test - train Accuracy: 0.977 (0.019)
2020-10-20 13:11:56 INFO rasa.nlu.test - train F1-score: 0.988 (0.010)
2020-10-20 13:11:56 INFO rasa.nlu.test - train Precision: 1.000 (0.000)
2020-10-20 13:11:56 INFO rasa.nlu.test - test Accuracy: 0.719 (0.133)
2020-10-20 13:11:56 INFO rasa.nlu.test - test F1-score: 0.716 (0.127)
2020-10-20 13:11:56 INFO rasa.nlu.test - test Precision: 0.762 (0.122)
Using this result, can we use the F1-score to compare the 2 pipelines.
Can we conclude that the pipeline with the higher F1-score is the one that we should use for better chatbot performance?