Do I need to create model again with 80% of data that split command has done?
Yes! The model should not be trained on the test data.
During testing, the trained model will be given each of the test examples. It will classify them as a certain intent, and then the predicted intent will be compared with the actual intent from the test labels. If the predicted and actual are the same, the example was predicted accurately. If your NLU data contains labeled entities, as well as trainable Entity Extractors (like CRFEntityExtractor) the trained model will also perform entity extraction on the test examples. If the trained model extracts the same entities as are labeled in the test set, then the entity extraction was performed accurately. If you have response selectors in your pipeline, they will be evaluated in the same way as the intent classifiers.
Reports and images will be output – if you have questions about any specific one of these please ask.
You can get the intent names that were confused from the confusion matrix by looking at the places where the colored squares don’t fall along the diagonal. In the image below, the circled spot (the intersection of predicted: deny and actual: mood_unhappy) means that the intent mood_unhappy is sometimes confused with the intent deny.