Confidence Mismatch


(Abhishak Varshney) #1

Hi everyone when I am training my model on one system it shows UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples. but when I train it on other system it is not showing this type of warning also when I checked it by sending some chat through curl command there is huge difference in confidence on both the system. Do anyone know why this happen and how do I modify them?

##command given for both the system are same.


(Akela Drissner) #2

Are you sure you’ve trained on the same data? This usually means you don’t have enough training data


(Abhishak Varshney) #3

Yes I know that It means training data is insufficient but I trained the same data so second system should also show the same warning. Apart from that after training both model accuracy is different.


(Akela Drissner) #4

What kind of confidence difference is there? Maybe post your NLU data here, I’d guess you probably just don’t have enough examples to have a consistent prediction


(Abhishak Varshney) #5

I am uploading number of training examples in each intent. In rasa nlu documentation they mentioned approx 20 training example would be sufficient but I have used much more than that. After training this data in a system having i3 processor 8 GB RAM I got 19.xx% confidence and in the system which have i7 processor, 8GB RAM, Intel XEON graphic card I got 37.xx% confidence on the same test example.


(Akela Drissner) #6

Ok how many different intents is that? That looks like a very large amount, the 20 examples per intent recommendation is just a starting point for fewer number of intents. Also if there’s overlap in your training data, the tensorflow_embedding pipeline might work better


(Abhishak Varshney) #7

There are total 86 intents.


(Akela Drissner) #8

Yeah you need more examples for this I would say. Also these confidence values, are you getting them from running the evaluation script or how?


(Abhishak Varshney) #9

I ran the command of rasa-nlu first on both the server:

python -m rasa_nlu.server -c config_spacy.json --path models/ -P 5050

and then I ran the command:

curl -XPOST localhost:5050/parse -d '{"q":"I want to purchase ticket"}'

which gives me difference in confidence on both the server.


(Akela Drissner) #10

Ok since you’ve trained this with spacy, that is to be expected. I assume there’s a lot of overlap in your intents, which spacy can’t handle very well. I’d suggest switching to the tensorflow_embedding pipeline. And then running the evaluation script on the model to see how well it does.


(Abhishak Varshney) #11

Yeah, Thanx @akelad it helped.