I want to evaluate the Dialog Policy as it was done in the TED Paper and therefore I’m looking for a way to train different models on varying number of dialogs without having to split manually.
When I run rasa train core using the --percentages flag as described here I still end up with one model instead of one for each percentage value.
Is there some built in support (I also haven’t found an option to use cross validation) for evaluating core?
I did not specify different config files like in the example because I only want to use one but varying amounts of data so I’m not sure if this is even possible.
To me this seems like a bug. It’s a bit tricky to fix though as we need some other marker than the current to decide whether the user explicitly wants to do comparison training.