Hello Rasa Community!
This is my first post on this forum. I hope I will get the desperately needed help from this community!!
I trained a question/answer chatbot which works extremely well when I perform validation on a large dataset of 14K questions with approx. 98% accuracy on entity extraction. Few errors are reported in DIETClassifier_errors.json file!
For validation, I use the command
> rasa test nlu --nlu tesdata.yml
However, when the same model is evaluated on rasa shell nlu using a question selected from the same testdata.yml file, the bot extracts incorrect entities on a large number of questions! For example, instead of extracting a tri-gram, the bot extracts a bi-gram and a uni-gram or three uni-grams!
Shouldn’t extraction of uni-gram instead of a bi- or a tri-gram be considered as an error in the “rasa test nlu” command as well? If this is the expected behavior, can I change some configuration/setting such that such errors start to report in the “rasa nlu test” command as well?
Question: Can some expert shed light on what could possibly be the reason for this inconsistency?
What I have Checked/Validated:
-
I have tested the yml file using yml-validator tools to ensure that the input file is not faulty.
-
The failure cases are randomly distributed. i.e. some questions that fail are present in the middle of the file, some in the end etc. so no fixed pattern in failing.
-
rasa test nlu does report some failure cases. So, it’s not the case that “rasa nlu test” is not reporting any error at all!
-
Following are the contents of my config file!
# Configuration for Rasa NLU.
# https://rasa.com/docs/rasa/nlu/components/
language: en
pipeline:
# # No configuration for the NLU pipeline was provided. The following default pipeline was used to train your model.
# # If you'd like to customize it, uncomment and adjust the pipeline.
# # See https://rasa.com/docs/rasa/tuning-your-model for more information.
- name: SpacyNLP
- name: SpacyTokenizer
- name: RegexFeaturizer
case_sensitive: False
use_word_boundaries: True
- name: LexicalSyntacticFeaturizer
- name: SpacyFeaturizer
- name: DIETClassifier
entity_recognition: True
epochs: 100
constrain_similarities: true
# - name: filterEmail.FilterEmail
- name: ResponseSelector
epochs: 100
constrain_similarities: true
- name: FallbackClassifier
threshold: 0.4
ambiguity_threshold: 0.1
# Configuration for Rasa Core.
# https://rasa.com/docs/rasa/core/policies/
policies:
# # No configuration for policies was provided. The following default policies were used to train your model.
# # If you'd like to customize them, uncomment and adjust the policies.
# # See https://rasa.com/docs/rasa/policies for more information.
- name: MemoizationPolicy
- name: TEDPolicy
max_history: 5
epochs: 100
constrain_similarities: true
- name: RulePolicy
importers:
#- name: "importers.dataimporter.DataImporter"
- name: "RasaFileImporter"