Where can I find more information about the training process of rasa core?
Unfortunately, I find this whole process very opaque at the moment and suspect that it might interfere with our bot performance. Here are some specific issues and questions I encountered:
- What exactly happens during the data preparation (augmentation?) phase?
- Is the graph created with the visualization facility the same that is used for training or is it completely unrelated?
- Do more common examples in the stories file figure more frequently in the training data? Or is training data sampled uniformly from the constructed graph?
- Why is the data preparation/augmentation phase called twice? I get the phase starting with “Creating states and action examples from collected trackers (by MaxHistoryTrackerFeaturizer)…” twice per training and it takes very long to finish for me, making training very slow. Also, is 81.29it/s a realistic number or are my stories processed very slowly?
- I use sklearn policy with grid search and the CV scores are always > 98% accuracy but in practice the bot performs quite badly (though NLU is mostly correct). I suspect that data leaks from train to validation set (due to repeated samples?). What KPI can I use to get a realistic expectation of how well the rasa core model predicts? Or is the only way to interact with the bot and see what comes?
- Is there an easy way to replace the training process with a custom training process?