Why use multiple runs when testing model

When comparing NLU-Pipelines i wonder why i should use multiple runs. Accroding to the docs a train/test-split is done only once, so I see no benefit in multiple runs - the only thing that might changes are the random initialization weights of the model.

Am I missing something?

I am talking about this command:

rasa test nlu --config pretrained_embeddings_spacy.yml supervised_embeddings.yml --nlu data/nlu.md --runs 3 --percentages 0 25 50 70 90

from here Testing Your Assistant

We are creating a different train/test split for every run. The percentage indicates how much data to exclude from the training data. In the example command you provided, we would train and test the two model configs each 15 times. Every time there will be a new train/test split. Taking randomly data for training and testing each time ensures that you get a realistic indication how well your model is performing. If you would just run the model once, it might be that the model was just lucky or that all “easy” examples ended up in the test data and the model performed very well. To avoid those scenarios we run the model multiple times on multiple train/test splits. Hope that helps.