Rasa version: Docker image rasa/rasa_nlu:0.14.4-full
Content of configuration file (config.yml):
language: "pt"
pipeline: "tensorflow_embedding"
Hey guys, I need some help to figure out why tensorflow is returning high confidence levels when the query text has the same number of digits of one of the training examples. (Examples are in Portuguese but what really matters are the digits)
So, here is what happens:
One of my training examples is “Vocês tem vaga para nefrologia 2019?”
And then when I query the text “um casal e 2 crianças de 6 anos e 15”, which is not even a little similar to the training example, returning confidence is more than 0.85 (Our confidence minimum score)
So the only thing I can see that is similar to the training is the number of digits, both with 4.
When I remove this training example the confidence drops to 0.65.
Here’s my training data if you want to reproduce and analyze it: santissimo-resort___model_20190404-182917.tar.gz (522.7 KB)
This is it, hope you guys can help me.
Thanks in advance!