Trying to understand model training and testing

Hi everyone, i was trying some stuff and got a couple of questions, here i go. First of all, i am working in my bot and periodically training, as the model finishes its train i usually test it with the command ‘rasa test’. So, i start looking at the results folder and more specifically the intent_errors.json and response_selection_errors.json files when i find some errors that are not actually in data/nlu.yml but may have been (i cannot recall) in older versions of the same model.

I came to these conclusions, either the model is trained using these older versions (possibly just the last one). Or, the rasa test command may create some random text intent examples, but i find it not as possible as the last option.

If my first conclusion is true i would like to know what is the best approach, should i delete the older models before training, since these old examples are not good anymore, or should i keep them for better performance. Thank you very much for your time