Not sure if this is already available and I’m just not looking in the right place.
I currently have dedicated test sets defined to evaluate F-1 scores, precision and recall for my NLU-only bot. I also conduct a cross validation and a test using a random split to see how my model performs over a number of iterations.
One of the key business metrics I’m looking at is - how many intents does the bot get right, specifically above the threshold of what I’ve tentatively defined as the fallback. I get this information currently through the histogram PNG that is generated, however, the image has me guessing at how many exact samples the bot got right.
Is it possible to also include a JSON with the values for how that histogram is generated?
Another question related to bot evaluation is when I conduct a split test, the support metric that is shown alongside the F-1 score, precision and recall (weighted and macro average at the end) is capped at 999. Is this by design? I significantly increased the number of total examples before the split, but this support value remains unchanged. Am I misreading it?
Many thanks for your help in advance and have a great weekend ahead!