Confidence Mismatch

Hi everyone when I am training my model on one system it shows UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples. but when I train it on other system it is not showing this type of warning also when I checked it by sending some chat through curl command there is huge difference in confidence on both the system. Do anyone know why this happen and how do I modify them?

##command given for both the system are same.

1 Like

Are you sure you’ve trained on the same data? This usually means you don’t have enough training data

Yes I know that It means training data is insufficient but I trained the same data so second system should also show the same warning. Apart from that after training both model accuracy is different.

What kind of confidence difference is there? Maybe post your NLU data here, I’d guess you probably just don’t have enough examples to have a consistent prediction

I am uploading number of training examples in each intent. In rasa nlu documentation they mentioned approx 20 training example would be sufficient but I have used much more than that. After training this data in a system having i3 processor 8 GB RAM I got 19.xx% confidence and in the system which have i7 processor, 8GB RAM, Intel XEON graphic card I got 37.xx% confidence on the same test example.

Ok how many different intents is that? That looks like a very large amount, the 20 examples per intent recommendation is just a starting point for fewer number of intents. Also if there’s overlap in your training data, the tensorflow_embedding pipeline might work better

There are total 86 intents.

Yeah you need more examples for this I would say. Also these confidence values, are you getting them from running the evaluation script or how?

I ran the command of rasa-nlu first on both the server:

python -m rasa_nlu.server -c config_spacy.json --path models/ -P 5050

and then I ran the command:

curl -XPOST localhost:5050/parse -d '{"q":"I want to purchase ticket"}'

which gives me difference in confidence on both the server.

Ok since you’ve trained this with spacy, that is to be expected. I assume there’s a lot of overlap in your intents, which spacy can’t handle very well. I’d suggest switching to the tensorflow_embedding pipeline. And then running the evaluation script on the model to see how well it does.

1 Like

Yeah, Thanx @akelad it helped.