Rasa[tensorflow_embedding] returns high confidence level if query text has the same number of digits as one of the training examples

mateeuswagner · April 5, 2019, 7:11pm

Rasa version: Docker image rasa/rasa_nlu:0.14.4-full

Content of configuration file (config.yml):

language: "pt"

pipeline: "tensorflow_embedding"

Hey guys, I need some help to figure out why tensorflow is returning high confidence levels when the query text has the same number of digits of one of the training examples. (Examples are in Portuguese but what really matters are the digits)

So, here is what happens:

One of my training examples is “Vocês tem vaga para nefrologia 2019?”

And then when I query the text “um casal e 2 crianças de 6 anos e 15”, which is not even a little similar to the training example, returning confidence is more than 0.85 (Our confidence minimum score)

So the only thing I can see that is similar to the training is the number of digits, both with 4.

When I remove this training example the confidence drops to 0.65.

Here’s my training data if you want to reproduce and analyze it: santissimo-resort___model_20190404-182917.tar.gz (522.7 KB)

This is it, hope you guys can help me.

Thanks in advance!

Topic		Replies	Views
Same training data in different projects give different confidence scores Rasa Open Source	3	555	February 26, 2019
Confidence score Different between AWS instance and Local system Rasa Open Source	6	539	October 24, 2019
As number of intents increases, confidence level decreases Rasa Open Source	7	2074	August 24, 2018
Confidence Score Computations Rasa Open Source	4	2962	September 14, 2018
How can we improve confidence score of intents Rasa Open Source	7	4663	October 15, 2018

Rasa[tensorflow_embedding] returns high confidence level if query text has the same number of digits as one of the training examples

Related topics