Environment Confidence Discrepancies

Hello, I have trained a model and when I deployed it on two identical environments I get slight discrepancies in intent confidences. If I run an nlu test on both environments most confidence scores are exactly the same up to 10 digits! But for 2-3 examples (out of 300) the scores are different.

I understand that due to random number generators etc even identical setup files can produce trained models with slightly different intent confidence scores. BUT this is the same trained model loaded on a clone environment. So what could be the reason behind this?

Diving in to one of the “problematic examples” I noticed the following if this helps: “word1 word2 word3” environment 1 Confidence <> “word1 word2 word3” environment 2 Confidence Now if i start randomly removing 1 or 2 of the words all the rest of the combinations have exactly the same confidence score up to the last digit!: “word1 word3” env1 conf = “word1 word3” env2 conf “word2 word3” env1 conf = “word2 word3” env2 conf “word1” env1 conf = “word1” env2 conf “word2” env1 conf = “word2” env2 conf “word3” env1 conf = “word3” env2 conf

Thank u all in advance, Konstantinos

Sounds like it could be a hardware difference. Are these running on the exact same hardware? Also if you’re using CUDA, cuDNN in particular has some problems with reproducible results.

1 Like

The tokenizer - lemmatizer was creating discrepancies. I disabled the lemmatizer and used a different approach as well as setting random_seed. Not sure which of the two solved the problem though!

Glad you got it sorted! :smiley: