I want to learn how the test/train split works in rasa? 60% train data 40% test data? always 60-40 or it changes for each model?
is there a re sampling every time we train the model?
cross validation per default ?
I know Rachael mentioned the tensorflow settings but I don’t know where to find this specific information
@Tobias_Wochinger can you help find some answers plz?
This docs include parameters for CLI testing: Command Line Interface.
There’s a parameter for the splits:
" --training-fraction TRAINING_FRACTION
Percentage of the data which should be in the training
data. (default: 0.8)"
So no by default it splits 0.8 to 0.2.
There’s no resampling to my knowledge, you have to split over and over again but if you’ve changed your NLU data you should split again obviously, however if it’s not the case and if you want to only test different configs, for reproducibility you should use the same data you’ve split earlier. (You can give the directory to rasa test nlu)
I don’t get what per default means.
Hope this helps.
It is a benchmark or recommended while training the model we consider 80:20 ratio is a standard starting point to train and test our model.
In RASA they have set the default 0.8 as mention:
–training-fraction TRAINING_FRACTION
Percentage of the data which should be in the training
data. (default: 0.8)
So, if you have enough data then you not need to worried about changing the ratio, as it’s standard and can deal with a large number of data as deep learning required.
Currently, A cross-validation test specifies a number (k) of folds that should be used to evaluate the model. By default, Rasa sets the number of folds to 5 for further reading please read this detailed blog by Karen WhiteWrite Tests! How to Make Automated Testing Part of Your Rasa Dev Workflow
I hope it will help you. Seen your related questions today on Youtube. Happy learning
If you have any further doubt please do let me know!