Hi, I am trying to use linear_norm setting for confidence score calculation but I found that it’s not reliable in terms of generating a confidence score even though it is the recommended setting. To highlight the issue, I trained an NLU model with a few generic intents (e.g., yes, no, greeting, etc) and evaluated the NLU model using the training paraphrases, and to my surprise the confidence scores I get vary drastically across intents. For example: the confidence score for paraphrasesin “yes” intent are around 0.55 while for “goodbye” intent it’s 1.0.
In such a scenario, what I am struggling with is how to do I set a confidence threshold. Has anyone else faced a similar situation? Am I doing something wrong or if there are any ways to fix this?
I believe I need some way to normalize the confidence scores across intents but I don’t know if there is a parameter in the pipeline that could help with that. Would really appreciate any help I can get.
Below are my nlu.yml and config.yml files for reference. Also, included an evaluation report file that includes the predictions and confidence scores I get from the NLU model.
config.yml (376 Bytes)
nlu.yml (18.1 KB)
evaluation_report.json (98.1 KB)