Hi ! I’m using the tensorflow pipeline, and it’s working great to detect the intents, but sometimes when it predicts totally irrelevant intent.
To be simple, it tries to find a intent for each sentences even when it’s not logical.
For example, for the french word “on”, it predicts with 0.8 confidence an intent, which do not contains any example with less than 4 words. Yes the word “on” is in some of those example, but it shouldn’t predict this intent.
Do you have recommendations to avoid such error ? I can give the training data for this intent but I don’t think it’s useful here.
My training look like this :
2018-11-27 14:01:45 INFO rasa_nlu.utils.spacy_utils - Trying to load spacy model with name 'fr'
2018-11-27 14:01:59 INFO rasa_nlu.components - Added 'nlp_spacy' to component cache. Key 'nlp_spacy-fr'.
2018-11-27 14:01:59 INFO rasa_nlu.training_data.loading - Training data format of ./data/nlu_data.json is rasa_nlu
2018-11-27 14:01:59 INFO rasa_nlu.training_data.training_data - Training data stats:
- intent examples: 82 (10 distinct intents)
- Found intents: 'demande_envoie_carte_TP', 'demander_telecharger_carte_TP', 'aurevoir', 'saluer', 'fin_requete', 'demande_carte_TP', 'envoie_carte_TP', 'confirmation', 'demande_carte_TP_ayant_droit', 'telecharger_carte_TP'
- entity examples: 9 (1 distinct entities)
- found entities: 'famille'
2018-11-27 14:01:59 DEBUG rasa_nlu.training_data.training_data - Validating training data...
2018-11-27 14:01:59 INFO rasa_nlu.model - Starting to train component SpellChecking
2018-11-27 14:01:59 INFO rasa_nlu.model - Finished training component.
2018-11-27 14:01:59 INFO rasa_nlu.model - Starting to train component nlp_spacy
2018-11-27 14:02:01 INFO rasa_nlu.model - Finished training component.
2018-11-27 14:02:01 INFO rasa_nlu.model - Starting to train component tokenizer_spacy
2018-11-27 14:02:01 INFO rasa_nlu.model - Finished training component.
2018-11-27 14:02:01 INFO rasa_nlu.model - Starting to train component ner_crf
2018-11-27 14:02:01 INFO rasa_nlu.model - Finished training component.
2018-11-27 14:02:01 INFO rasa_nlu.model - Starting to train component ner_synonyms
2018-11-27 14:02:01 INFO rasa_nlu.model - Finished training component.
2018-11-27 14:02:01 INFO rasa_nlu.model - Starting to train component tokenizer_whitespace
2018-11-27 14:02:01 INFO rasa_nlu.model - Finished training component.
2018-11-27 14:02:01 INFO rasa_nlu.model - Starting to train component intent_entity_featurizer_regex
2018-11-27 14:02:01 INFO rasa_nlu.model - Finished training component.
2018-11-27 14:02:01 INFO rasa_nlu.model - Starting to train component intent_featurizer_count_vectors
2018-11-27 14:02:01 INFO rasa_nlu.model - Finished training component.
2018-11-27 14:02:01 INFO rasa_nlu.model - Starting to train component intent_classifier_tensorflow_embedding
2018-11-27 14:02:01 DEBUG rasa_nlu.classifiers.embedding_intent_classifier - Check if num_neg 20 is smaller than number of intents 10, else set num_neg to the number of intents - 1
2018-11-27 14:02:09.483379: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-11-27 14:02:09 INFO rasa_nlu.classifiers.embedding_intent_classifier - Accuracy is updated every 10 epochs
Epochs: 100%|██████████| 300/300 [00:04<00:00, 65.73it/s, loss=0.152, acc=1.000]
2018-11-27 14:02:14 INFO rasa_nlu.classifiers.embedding_intent_classifier - Finished training embedding policy, loss=0.152, train accuracy=1.000
2018-11-27 14:02:14 INFO rasa_nlu.model - Finished training component.
2018-11-27 14:02:14 INFO rasa_nlu.model - Successfully saved model into '/app/models/chatbot-masante/nlu'
2018-11-27 14:02:14 INFO __main__ - Finished training
There is the following statement in the rasa doc : “One common misconception is that if your model reports high confidence on your training examples, it is a “better” model. In fact, this usually means that your model is overfitting.”
I don’t know if it the reason for my bug, but what is the common reason for overfitting ? Too many example that looks alike ? But I don’t see how my training data would overfits for the word “on”
You have a training accuracy of 1.00, usually a good evaluation on a test dataset or cross validation can you give you an indication whether your model is overfitting or underfitting.
Try the cross-validation evaluation on the same dataset for atleast 5 folds. You can better understand what is going wrong when you do that.
we use cross-validation as a gating to produce a good model
It seems like your model is underfitting, because the train f1 score is really high while the test f1 score is low and the gap is quite significant meaning the model was not able to generalize on new data(test data).
Did you take a look at the confusion matrix?
do you see confusion with some intents?
the case of underfitting could be lack of enough training data, since you have an average 8 examples per intent which i suppose is not the case since some intents might have larger number of examples than others. Also tensorflow classifier learns embedding from scratch, you will need atleast 10-15 examples per intents or quite a balanced one.
If you don’t have enough examples, the spaCy model in french is not bad and contains some good amount of vectors. You could also use fastText vectors which is quite large but then you would need to convert them into spaCy model first.
If you stick to tensorflow, add some more examples and verify the confusion matrix using cross-validation. it should be okay
I’ll try the confusion matrix and come back to you.
Indeed, most of them have 10-15 examples and other 2-3 (like greet or goodbye), I’ll try to balanced them and see if it helps.
I know spacy is better for small data set, but it’s the beginning of the project and I know i’ll have more than 1000 examples in the end, but maybe I should start with spacy anyway. The problem with spacy is that I don’t know how to lower the treshold, because right know it doesn’t understand some of the sentence that are on the data set. I use the command line rasa_core.train but I got the following error : “error: unrecognized arguments: --nlu_threshold 0,3”
Anyway, thank you for your help, I’ll try you recommandations and come back to you if it doesn’t help
Edit : Are you sure the confusion matrix works in cross validation ? I tried with the argument “–confmat ./conf
mat/” but nothing is created. Maybe i’ll try with a dataset later