How the test/train split works in rasa

AminaDerouiche · May 24, 2021, 10:05am

I want to learn how the test/train split works in rasa? 60% train data 40% test data? always 60-40 or it changes for each model?
is there a re sampling every time we train the model?
cross validation per default ?

I know Rachael mentioned the tensorflow settings but I don’t know where to find this specific information @Tobias_Wochinger can you help find some answers plz?

merveenoyan · May 24, 2021, 8:46pm

This docs include parameters for CLI testing: Command Line Interface. There’s a parameter for the splits:

" --training-fraction TRAINING_FRACTION Percentage of the data which should be in the training data. (default: 0.8)"
So no by default it splits 0.8 to 0.2.

There’s no resampling to my knowledge, you have to split over and over again but if you’ve changed your NLU data you should split again obviously, however if it’s not the case and if you want to only test different configs, for reproducibility you should use the same data you’ve split earlier. (You can give the directory to rasa test nlu)
I don’t get what per default means. Hope this helps.

nik202 · June 2, 2021, 8:52pm

@AminaDerouiche

It is a benchmark or recommended while training the model we consider 80:20 ratio is a standard starting point to train and test our model.

In RASA they have set the default 0.8 as mention: –training-fraction TRAINING_FRACTION Percentage of the data which should be in the training data. (default: 0.8)

Reference 1: Testing Your Assistant

Reference 2: https://rasa.com/docs/rasa/command-line-interface#rasa-data-splithttps://rasa.com/docs/rasa/command-line-interface#rasa-data-split

So, if you have enough data then you not need to worried about changing the ratio, as it’s standard and can deal with a large number of data as deep learning required.

If you further want to investigate how it works with the TensorFlow pipeline, I will suggest contacting Rasa Core Developer (Hope that will help) but you can even see this link and read it step by step: https://aspiresoftware.in/blog/rasa-nlu-intent-classification-using-different-pipeline/ Hope it will help

I think RASA as per my knowledge is not implemented the re-sampling, maybe I can be wrong but if in the context of TensorFlow you want to know sample please follow this detailed link: Sampling Methods Within TensorFlow Input Functions | Datatonic : Datatonic
Currently, A cross-validation test specifies a number (k) of folds that should be used to evaluate the model. By default, Rasa sets the number of folds to 5 for further reading please read this detailed blog by Karen White Write Tests! How to Make Automated Testing Part of Your Rasa Dev Workflow

I hope it will help you. Seen your related questions today on Youtube. Happy learning If you have any further doubt please do let me know!

AminaDerouiche · June 4, 2021, 4:05pm

@nik202 Thank you this really helps

I will have a closer look on Monday and will keep you posted if I have further question

again I really appreciate your help

nik202 · October 20, 2021, 2:09pm

@AminaDerouiche can I request please close this thread as solution for your reference and for others

Topic		Replies	Views
How to split train test data using python Rasa Open Source	9	758	October 13, 2021
Training model before running cross validation Rasa Open Source	1	915	June 10, 2021
Rasa split data nlu fails. Which algorithm is implemented? Rasa Open Source	8	466	September 7, 2020
How to create a test set Rasa Open Source	1	638	December 12, 2019
Comparing Policies - guide not clear Rasa Open Source	5	462	March 2, 2020

How the test/train split works in rasa

Related topics