Confidence Score Computations

piyushmakhija5 · September 13, 2018, 6:47am

Hi Everyone.

I was trying to understand what the confidences score outputted by rasa nlu actually are and how they are computed.

I have been working on intent classification task with tensorflow embedding. Once my model is trained and I parse new/test data, I receive a confidence score along with each probable intent. But I have little to no idea of what actually this confidence score represents.

As mentioned in docs, it does not represent probability. And after some observation of results, it seems to be a one-many type evaluation i.e. for a single text input, I can get multiple intents with high confidence scores.

After having a quick look at the code, I think it is computed in “_tf_sim” function in “embedding_intent_classifier.py” file (Relevant code segment below)

Can somebody please confirm/clarify on how or what confidence score means here?

def _tf_sim(self, a, b):
    """Define similarity"""

    if self.similarity_type == 'cosine':
        a = tf.nn.l2_normalize(a, -1)
        b = tf.nn.l2_normalize(b, -1)

    if self.similarity_type == 'cosine' or self.similarity_type == 'inner':
        sim = tf.reduce_sum(tf.expand_dims(a, 1) * b, -1)

        # similarity between intent embeddings
        sim_emb = tf.reduce_sum(b[:, 0:1, :] * b[:, 1:, :], -1)

        return sim, sim_emb
    else:
        raise ValueError("Wrong similarity type {}, "
                         "should be 'cosine' or 'inner'"
                         "".format(self.similarity_type))

souvikg10 · September 13, 2018, 9:33am

you are aware of what the pipeline for the embedding classifier actually does?

So in short and maybe I am not 100% correct but

Step 1 - Tokenisation of your training data

Step 2 - Featurization - the pipeline make a word embedding on a high dimensional plane or simply assign a vector value for each word using the Bag-of-words approach meaning words that are likely similar will be closer to each other. this distance is measured using the cosine distance between the two vector. the same approach is taken for the intents as well

Step 3 - Fit - The embeddings are then fit into a non-linear classification to find the best possible classes, so when new sentence or your test sentence is given to the model, it tries to find similarity( cosine distance) between the sentence in the test set as what was predicted to what it should be.

piyushmakhija5 · September 14, 2018, 6:18am

Hey @souvikg10, Thanks for your response.

It was quite helpful. Based on your description, I gather that confidence score is basically a similarity score/metrics which gives the the similarity of the input text (‘a’ in above code) with the embeddings for a certain class( ‘b’ in above code) or more explicitly embedding mapping from the utterance to some class.

Could you please confirm my understanding of this?

souvikg10 · September 14, 2018, 7:11am

Indeed but I would also say that you should run some evaluation using cross validation to get an F1 score of your training set to verify over fitting, this pipeline can also overfit

piyushmakhija5 · September 14, 2018, 7:35am

Thanks !! Will take your advice into account for sure

Topic		Replies	Views
How can we improve confidence score of intents Rasa Open Source	7	4676	October 15, 2018
Same training data in different projects give different confidence scores Rasa Open Source	3	567	February 26, 2019
NLU confidence score actually probability distribution? Getting Started with Rasa	5	247	February 3, 2021
Match vector confidences with intents labels Rasa Open Source	3	240	December 15, 2022
Intent Confidence Rasa Open Source	3	767	October 22, 2019

Confidence Score Computations

Related topics