Bad accuracy rasa shell

valdu02100 · January 27, 2020, 7:21pm

It’s weird when I’m in training mode RASA finds the prediction of intention, entities… However when I’m in shell mode, the accuracy is bad.

my configuration

language: fr
pipeline:
- name: WhitespaceTokenizer
  case_sensitive: false
- name: CRFEntityExtractor
  BILOU_flag: true
  features:
  - - low
    - title
    - upper
  - - bias
    - low
    - prefix5
    - prefix2
    - suffix5
    - suffix3
    - suffix2
    - upper
    - title
    - digit
    - pattern
  - - low
    - title
    - upper
- name: EntitySynonymMapper
- name: CountVectorsFeaturizer
  intent_tokenization_flag: true
  intent_split_symbol: +
- name: EmbeddingIntentClassifier
- name: RegexFeaturizer
- name: "DucklingHTTPExtractor"
  url: "http://localhost:8000"
  dimensions: ["time", "number", "amount-of-money", "distance"]
  locale: "fr_FR"
  timezone: "Europe/Paris"
  timeout : 3
policies:
- name: KerasPolicy
  epochs: 700
  batch_size: 100
  featurizer:
  - name: MaxHistoryTrackerFeaturizer
    max_history: 5
    state_featurizer:
    - name: BinarySingleStateFeaturizer
- name: MemoizationPolicy
  max_history: 5
- name: FallbackPolicy
  nlu_threshold: 0.7
  core_threshold: 0.4
  fallback_action_name: utter_oupsomethingfailed
- name: FormPolicy

rasa test result

valentin@mbp-de-valentin archelot % rasa test
2020-01-27 20:09:25 INFO     absl  - Entry Point [tensor2tensor.envs.tic_tac_toe_env:TicTacToeEnv] registered with id [T2TEnv-TicTacToeEnv-v0]
2020-01-27 20:09:25 INFO     rasa.core.policies.ensemble  - MappingPolicy not included in policy ensemble. Default intents 'restart and back will not trigger actions 'action_restart' and 'action_back'.
Processed Story Blocks:   0%|                                                                 | 0/29 [00:00<?, ?it/s, # trackers=1]/usr/local/lib/python3.7/site-packages/rasa/core/slots.py:217: UserWarning: Categorical slot 'sexe' is set to a value ('femmme') that is not specified in the domain. Value will be ignored and the slot will behave as if no value is set. Make sure to add all values a categorical slot should store to the domain.
  f"Categorical slot '{self.name}' is set to a value "
Processed Story Blocks: 100%|███████████████████████████████████████████████████████| 29/29 [00:00<00:00, 629.86it/s, # trackers=1]
2020-01-27 20:09:25 INFO     rasa.core.test  - Evaluating 14 stories
Progress:
100%|██████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:01<00:00, 10.88it/s]
2020-01-27 20:09:26 INFO     rasa.core.test  - Finished collecting predictions.
2020-01-27 20:09:26 INFO     rasa.core.test  - Evaluation Results on CONVERSATION level:
2020-01-27 20:09:26 INFO     rasa.core.test  - 	Correct:          12 / 14
2020-01-27 20:09:26 INFO     rasa.core.test  - 	F1-Score:         0.923
2020-01-27 20:09:26 INFO     rasa.core.test  - 	Precision:        1.000
2020-01-27 20:09:26 INFO     rasa.core.test  - 	Accuracy:         0.857
2020-01-27 20:09:26 INFO     rasa.core.test  - 	In-data fraction: 0.976
2020-01-27 20:09:26 INFO     rasa.core.test  - Evaluation Results on ACTION level:
2020-01-27 20:09:26 INFO     rasa.core.test  - 	Correct:          244 / 246
2020-01-27 20:09:26 INFO     rasa.core.test  - 	F1-Score:         0.992
2020-01-27 20:09:26 INFO     rasa.core.test  - 	Precision:        0.994
2020-01-27 20:09:26 INFO     rasa.core.test  - 	Accuracy:         0.992
2020-01-27 20:09:26 INFO     rasa.core.test  - 	In-data fraction: 0.976
2020-01-27 20:09:26 INFO     rasa.core.test  - 	Classification report: 
                                   precision    recall  f1-score   support

                     utter_thanks       1.00      1.00      1.00         6
            utter_ask_precision_s       1.00      1.00      1.00         6
          utter_favoris_ask_train       1.00      1.00      1.00        14
           utter_onboarding_crush       1.00      1.00      1.00        14
         utter_onboarding_mission       1.00      1.00      1.00        14
                    utter_goodbye       1.00      1.00      1.00         3
                form_find_someone       1.00      1.00      1.00         6
   action_reset_slot_find_someone       1.00      1.00      1.00         6
           utter_onboarding_limit       1.00      1.00      1.00        14
                  utter_show_menu       1.00      0.86      0.92        14
                      utter_greet       1.00      1.00      1.00        14
utter_interest_find_someone_false       1.00      1.00      1.00         6
 utter_interest_find_someone_true       1.00      1.00      1.00         8
            utter_onboarding_goal       1.00      1.00      1.00        14
          action_ask_favoris_city       1.00      1.00      1.00        14
                    utter_iamabot       1.00      1.00      1.00         1
                    action_listen       1.00      1.00      1.00        72
         utter_resume_favoris_all       0.75      1.00      0.86         6
          action_check_itineraire       1.00      1.00      1.00         8
        utter_resume_favoris_city       1.00      1.00      1.00         6

                        micro avg       0.99      0.99      0.99       246
                        macro avg       0.99      0.99      0.99       246
                     weighted avg       0.99      0.99      0.99       246

2020-01-27 20:09:27 INFO     rasa.nlu.test  - Confusion matrix, without normalization: 
[[14  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  8  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0 72  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  6  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  6  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  6  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0 14  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  3  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0 14  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  6  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  8  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0 14  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0 14  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0 14  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0 14  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  6  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  6  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  2  0 12  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  6]]
2020-01-27 20:09:31 INFO     rasa.nlu.test  - Running model for predictions:
100%|████████████████████████████████████████████████████████████████████████████████████████████| 295/295 [00:03<00:00, 75.70it/s]
2020-01-27 20:09:35 INFO     rasa.nlu.test  - Intent evaluation results:
2020-01-27 20:09:35 INFO     rasa.nlu.test  - Intent Evaluation: Only considering those 295 examples that have a defined intent out of 295 examples
2020-01-27 20:09:35 INFO     rasa.nlu.test  - Classification report saved to results/intent_report.json.
2020-01-27 20:09:35 INFO     rasa.nlu.test  - Incorrect intent predictions saved to results/intent_errors.json.
2020-01-27 20:09:35 INFO     rasa.nlu.test  - Confusion matrix, without normalization: 
[[16  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  5  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0 26  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0 20  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0 19  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0 10  0  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0 10  0  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0 67  0  0  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0 12  0  1  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0 15  0  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0 15  0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  8  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0 13  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  1  0  0  0  0  4  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  6  0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  8  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  8  0  0]
 [ 0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0 10  0]
 [ 0  0  0  0  0  0  0  0  0  0  1  0  0  0  0  0  0  0 19]]
2020-01-27 20:09:38 INFO     rasa.nlu.test  - Entity evaluation results:
2020-01-27 20:09:38 INFO     rasa.nlu.test  - Evaluation for entity extractor: CRFEntityExtractor 
2020-01-27 20:09:38 INFO     rasa.nlu.test  - Classification report for 'CRFEntityExtractor' saved to 'results/CRFEntityExtractor_report.json'.
2020-01-27 20:09:38 INFO     rasa.nlu.test  - Incorrect entity predictions saved to results/CRFEntityExtractor_errors.json.

Why if the training is good in these predictions, isn’t it the case when I discuss with the bot? Is my setup bad? Would spacy be better? What similar configuration would apply?

Thanks for tips.

dakshvar22 · January 28, 2020, 12:12pm

Did you create a train test split first? You can take a look at Evaluating Models to understand how to evaluate your models correctly.

Topic		Replies	Views
Some actions are not predicted accurate [Deprecated] Rasa X Community Edition	0	348	March 20, 2021
Inconsistency between results/intent_errors.json and rasa shell nlu Rasa Open Source	7	551	July 15, 2021
How to improve NLU accuracy? Rasa Open Source	4	2118	April 26, 2021
Rasa core prediction issue Rasa Open Source	5	489	May 23, 2019
Rasa shell command giving intent confidence Rasa Open Source	11	4806	January 14, 2021

Bad accuracy rasa shell

Related topics