I have a dataset with several intents. When using Rasa, I split the data into training and testing sets, allowing me to evaluate the intent classification capabilities of the NLU model on the test set.
Now, I want to evaluate the intent classification performance of this dataset using Rasa Pro. My current plan is to build conversation flows and train Rasa Pro using the training data. Then, I intend to write test cases like the following:
test_cases:
- test_case: test for intent A
steps:
- user: data_test_sample_1
- utter: utter_response_for_intent_A
- test_case: test for intent A
steps:
- user: data_test_sample_2
- utter: utter_response_for_intent_A
...
- test_case: test for intent Z
steps:
- user: data_test_sample_1
- utter: utter_response_for_intent_Z
- test_case: test for intent Z
steps:
- user: data_test_sample_2
- utter: utter_response_for_intent_Z
Based on the output of these test cases, I would evaluate whether the model’s predictions are accurate.
My question is: Is this a reasonable approach to assess intent classification in Rasa Pro? Or does Rasa Pro have built-in tools or recommended methods for testing intent classification that could simplify or improve this process?
Any guidance or best practices would be greatly appreciated!
Thank you in advance for your help!