Hello Rasa Community!
This is my first post on this forum. I hope I will get the desperately needed help from this community!!
I trained a question/answer chatbot which works extremely well when I perform validation on a large dataset of 14K questions with approx. 98% accuracy on entity extraction. Few errors are reported in DIETClassifier_errors.json file!
For validation, I use the command
> rasa test nlu --nlu tesdata.yml
However, when the same model is evaluated on rasa shell nlu using a question selected from the same testdata.yml file, the bot extracts incorrect entities on a large number of questions! For example, instead of extracting a tri-gram, the bot extracts a bi-gram and a uni-gram or three uni-grams!
Shouldn’t extraction of uni-gram instead of a bi- or a tri-gram be considered as an error in the “rasa test nlu” command as well? If this is the expected behavior, can I change some configuration/setting such that such errors start to report in the “rasa nlu test” command as well?
Question: Can some expert shed light on what could possibly be the reason for this inconsistency?
What I have Checked/Validated:
I have tested the yml file using yml-validator tools to ensure that the input file is not faulty.
The failure cases are randomly distributed. i.e. some questions that fail are present in the middle of the file, some in the end etc. so no fixed pattern in failing.
rasa test nlu does report some failure cases. So, it’s not the case that “rasa nlu test” is not reporting any error at all!
Following are the contents of my config file!
# Configuration for Rasa NLU. # https://rasa.com/docs/rasa/nlu/components/ language: en pipeline: # # No configuration for the NLU pipeline was provided. The following default pipeline was used to train your model. # # If you'd like to customize it, uncomment and adjust the pipeline. # # See https://rasa.com/docs/rasa/tuning-your-model for more information. - name: SpacyNLP - name: SpacyTokenizer - name: RegexFeaturizer case_sensitive: False use_word_boundaries: True - name: LexicalSyntacticFeaturizer - name: SpacyFeaturizer - name: DIETClassifier entity_recognition: True epochs: 100 constrain_similarities: true # - name: filterEmail.FilterEmail - name: ResponseSelector epochs: 100 constrain_similarities: true - name: FallbackClassifier threshold: 0.4 ambiguity_threshold: 0.1 # Configuration for Rasa Core. # https://rasa.com/docs/rasa/core/policies/ policies: # # No configuration for policies was provided. The following default policies were used to train your model. # # If you'd like to customize them, uncomment and adjust the policies. # # See https://rasa.com/docs/rasa/policies for more information. - name: MemoizationPolicy - name: TEDPolicy max_history: 5 epochs: 100 constrain_similarities: true - name: RulePolicy importers: #- name: "importers.dataimporter.DataImporter" - name: "RasaFileImporter"