Rasa NLU Tensorflow predict something instead of None

Hi ! I’m using the tensorflow pipeline, and it’s working great to detect the intents, but sometimes when it predicts totally irrelevant intent. To be simple, it tries to find a intent for each sentences even when it’s not logical. For example, for the french word “on”, it predicts with 0.8 confidence an intent, which do not contains any example with less than 4 words. Yes the word “on” is in some of those example, but it shouldn’t predict this intent.

Do you have recommendations to avoid such error ? I can give the training data for this intent but I don’t think it’s useful here.

My training look like this :

2018-11-27 14:01:45 INFO     rasa_nlu.utils.spacy_utils  - Trying to load spacy model with name 'fr'
2018-11-27 14:01:59 INFO     rasa_nlu.components  - Added 'nlp_spacy' to component cache. Key 'nlp_spacy-fr'.
2018-11-27 14:01:59 INFO     rasa_nlu.training_data.loading  - Training data format of ./data/nlu_data.json is rasa_nlu
2018-11-27 14:01:59 INFO     rasa_nlu.training_data.training_data  - Training data stats:
    - intent examples: 82 (10 distinct intents)
    - Found intents: 'demande_envoie_carte_TP', 'demander_telecharger_carte_TP', 'aurevoir', 'saluer', 'fin_requete', 'demande_carte_TP', 'envoie_carte_TP', 'confirmation', 'demande_carte_TP_ayant_droit', 'telecharger_carte_TP'
    - entity examples: 9 (1 distinct entities)
    - found entities: 'famille'
2018-11-27 14:01:59 DEBUG    rasa_nlu.training_data.training_data  - Validating training data...
2018-11-27 14:01:59 INFO     rasa_nlu.model  - Starting to train component SpellChecking
2018-11-27 14:01:59 INFO     rasa_nlu.model  - Finished training component.
2018-11-27 14:01:59 INFO     rasa_nlu.model  - Starting to train component nlp_spacy
2018-11-27 14:02:01 INFO     rasa_nlu.model  - Finished training component.
2018-11-27 14:02:01 INFO     rasa_nlu.model  - Starting to train component tokenizer_spacy
2018-11-27 14:02:01 INFO     rasa_nlu.model  - Finished training component.
2018-11-27 14:02:01 INFO     rasa_nlu.model  - Starting to train component ner_crf
2018-11-27 14:02:01 INFO     rasa_nlu.model  - Finished training component.
2018-11-27 14:02:01 INFO     rasa_nlu.model  - Starting to train component ner_synonyms
2018-11-27 14:02:01 INFO     rasa_nlu.model  - Finished training component.
2018-11-27 14:02:01 INFO     rasa_nlu.model  - Starting to train component tokenizer_whitespace
2018-11-27 14:02:01 INFO     rasa_nlu.model  - Finished training component.
2018-11-27 14:02:01 INFO     rasa_nlu.model  - Starting to train component intent_entity_featurizer_regex
2018-11-27 14:02:01 INFO     rasa_nlu.model  - Finished training component.
2018-11-27 14:02:01 INFO     rasa_nlu.model  - Starting to train component intent_featurizer_count_vectors
2018-11-27 14:02:01 INFO     rasa_nlu.model  - Finished training component.
2018-11-27 14:02:01 INFO     rasa_nlu.model  - Starting to train component intent_classifier_tensorflow_embedding
2018-11-27 14:02:01 DEBUG    rasa_nlu.classifiers.embedding_intent_classifier  - Check if num_neg 20 is smaller than number of intents 10, else set num_neg to the number of intents - 1
2018-11-27 14:02:09.483379: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-11-27 14:02:09 INFO     rasa_nlu.classifiers.embedding_intent_classifier  - Accuracy is updated every 10 epochs
Epochs: 100%|██████████| 300/300 [00:04<00:00, 65.73it/s, loss=0.152, acc=1.000]
2018-11-27 14:02:14 INFO     rasa_nlu.classifiers.embedding_intent_classifier  - Finished training embedding policy, loss=0.152, train accuracy=1.000
2018-11-27 14:02:14 INFO     rasa_nlu.model  - Finished training component.
2018-11-27 14:02:14 INFO     rasa_nlu.model  - Successfully saved model into '/app/models/chatbot-masante/nlu'
2018-11-27 14:02:14 INFO     __main__  - Finished training

There is the following statement in the rasa doc : “One common misconception is that if your model reports high confidence on your training examples, it is a “better” model. In fact, this usually means that your model is overfitting.” I don’t know if it the reason for my bug, but what is the common reason for overfitting ? Too many example that looks alike ? But I don’t see how my training data would overfits for the word “on”

You have a training accuracy of 1.00, usually a good evaluation on a test dataset or cross validation can you give you an indication whether your model is overfitting or underfitting.

Try the cross-validation evaluation on the same dataset for atleast 5 folds. You can better understand what is going wrong when you do that.

we use cross-validation as a gating to produce a good model

I tried the cross validation but I didn’t really understand the output. Maybe if you can help me : (with 6 folds)

2018-11-27 18:35:47 INFO     rasa_nlu.classifiers.embedding_intent_classifier  - Finished training embedding policy, loss=0.080, train accuracy=1.000
2018-11-27 18:35:47 INFO     rasa_nlu.model  - Finished training component.
2018-11-27 18:35:49 INFO     __main__  - CV evaluation (n=6)
2018-11-27 18:35:49 INFO     __main__  - Intent evaluation results
2018-11-27 18:35:49 INFO     __main__  - train Accuracy: 1.000 (0.000)
2018-11-27 18:35:49 INFO     __main__  - train F1-score: 1.000 (0.000)
2018-11-27 18:35:49 INFO     __main__  - train Precision: 1.000 (0.000)
2018-11-27 18:35:49 INFO     __main__  - test Accuracy: 0.560 (0.115)
2018-11-27 18:35:49 INFO     __main__  - test F1-score: 0.510 (0.108)
2018-11-27 18:35:49 INFO     __main__  - test Precision: 0.497 (0.103)
2018-11-27 18:35:49 INFO     __main__  - Entity evaluation results
2018-11-27 18:35:49 INFO     __main__  - Entity extractor: ner_crf
2018-11-27 18:35:49 INFO     __main__  - train Accuracy: 0.999 (0.001)
2018-11-27 18:35:49 INFO     __main__  - train F1-score: 0.999 (0.001)
2018-11-27 18:35:49 INFO     __main__  - train Precision: 0.999 (0.001)
2018-11-27 18:35:49 INFO     __main__  - Entity extractor: ner_crf
2018-11-27 18:35:49 INFO     __main__  - test Accuracy: 0.984 (0.005)
2018-11-27 18:35:49 INFO     __main__  - test F1-score: 0.980 (0.006)
2018-11-27 18:35:49 INFO     __main__  - test Precision: 0.977 (0.010)
2018-11-27 18:35:49 INFO     __main__  - Finished evaluation

It seems like your model is underfitting, because the train f1 score is really high while the test f1 score is low and the gap is quite significant meaning the model was not able to generalize on new data(test data).

Did you take a look at the confusion matrix? do you see confusion with some intents?

the case of underfitting could be lack of enough training data, since you have an average 8 examples per intent which i suppose is not the case since some intents might have larger number of examples than others. Also tensorflow classifier learns embedding from scratch, you will need atleast 10-15 examples per intents or quite a balanced one.

If you don’t have enough examples, the spaCy model in french is not bad and contains some good amount of vectors. You could also use fastText vectors which is quite large but then you would need to convert them into spaCy model first.

If you stick to tensorflow, add some more examples and verify the confusion matrix using cross-validation. it should be okay

1 Like

I’ll try the confusion matrix and come back to you. Indeed, most of them have 10-15 examples and other 2-3 (like greet or goodbye), I’ll try to balanced them and see if it helps.

I know spacy is better for small data set, but it’s the beginning of the project and I know i’ll have more than 1000 examples in the end, but maybe I should start with spacy anyway. The problem with spacy is that I don’t know how to lower the treshold, because right know it doesn’t understand some of the sentence that are on the data set. I use the command line rasa_core.train but I got the following error : “error: unrecognized arguments: --nlu_threshold 0,3”

Anyway, thank you for your help, I’ll try you recommandations and come back to you if it doesn’t help

Edit : Are you sure the confusion matrix works in cross validation ? I tried with the argument “–confmat ./conf mat/” but nothing is created. Maybe i’ll try with a dataset later

No, in cross-validation mode the confusion matrix will not be generated. You need to evaluate a model on a test set to generate the confusion matrix.

Its stated in the docs here

1 Like