How the test/train split works in rasa

  • I want to learn how the test/train split works in rasa? 60% train data 40% test data? always 60-40 or it changes for each model?
  • is there a re sampling every time we train the model?
  • cross validation per default ?

I know Rachael mentioned the tensorflow settings but I don’t know where to find this specific information @Tobias_Wochinger can you help find some answers plz?

This docs include parameters for CLI testing: Command Line Interface. There’s a parameter for the splits:

  • " --training-fraction TRAINING_FRACTION Percentage of the data which should be in the training data. (default: 0.8)"
  • So no by default it splits 0.8 to 0.2.
  • There’s no resampling to my knowledge, you have to split over and over again but if you’ve changed your NLU data you should split again obviously, however if it’s not the case and if you want to only test different configs, for reproducibility you should use the same data you’ve split earlier. (You can give the directory to rasa test nlu)
  • I don’t get what per default means. Hope this helps.
1 Like

@AminaDerouiche

  1. It is a benchmark or recommended while training the model we consider 80:20 ratio is a standard starting point to train and test our model.

In RASA they have set the default 0.8 as mention: –training-fraction TRAINING_FRACTION Percentage of the data which should be in the training data. (default: 0.8)

Reference 1: Testing Your Assistant

Reference 2: https://rasa.com/docs/rasa/command-line-interface#rasa-data-splithttps://rasa.com/docs/rasa/command-line-interface#rasa-data-split

So, if you have enough data then you not need to worried about changing the ratio, as it’s standard and can deal with a large number of data as deep learning required.

If you further want to investigate how it works with the TensorFlow pipeline, I will suggest contacting Rasa Core Developer (Hope that will help) but you can even see this link and read it step by step: https://aspiresoftware.in/blog/rasa-nlu-intent-classification-using-different-pipeline/ Hope it will help :slight_smile:

  1. I think RASA as per my knowledge is not implemented the re-sampling, maybe I can be wrong but if in the context of TensorFlow you want to know sample please follow this detailed link: Sampling Methods Within TensorFlow Input Functions | Datatonic : Datatonic

  2. Currently, A cross-validation test specifies a number (k) of folds that should be used to evaluate the model. By default, Rasa sets the number of folds to 5 for further reading please read this detailed blog by Karen White Write Tests! How to Make Automated Testing Part of Your Rasa Dev Workflow

I hope it will help you. Seen your related questions today on Youtube. Happy learning :slight_smile: If you have any further doubt please do let me know!

4 Likes

@nik202 Thank you this really helps :blush:

I will have a closer look on Monday and will keep you posted if I have further question

again I really appreciate your help :grin:

1 Like

@AminaDerouiche can I request please close this thread as solution for your reference and for others :slight_smile:

1 Like