Why entity extraction results in "rasa test nlu" are different from "rasa shell nlu"

Hello Rasa Community!

This is my first post on this forum. I hope I will get the desperately needed help from this community!!

I trained a question/answer chatbot which works extremely well when I perform validation on a large dataset of 14K questions with approx. 98% accuracy on entity extraction. Few errors are reported in DIETClassifier_errors.json file!

For validation, I use the command

> rasa test nlu --nlu tesdata.yml

However, when the same model is evaluated on rasa shell nlu using a question selected from the same testdata.yml file, the bot extracts incorrect entities on a large number of questions! For example, instead of extracting a tri-gram, the bot extracts a bi-gram and a uni-gram or three uni-grams!

Shouldn’t extraction of uni-gram instead of a bi- or a tri-gram be considered as an error in the “rasa test nlu” command as well? If this is the expected behavior, can I change some configuration/setting such that such errors start to report in the “rasa nlu test” command as well?

Question: Can some expert shed light on what could possibly be the reason for this inconsistency?

What I have Checked/Validated:

  • I have tested the yml file using yml-validator tools to ensure that the input file is not faulty.

  • The failure cases are randomly distributed. i.e. some questions that fail are present in the middle of the file, some in the end etc. so no fixed pattern in failing.

  • rasa test nlu does report some failure cases. So, it’s not the case that “rasa nlu test” is not reporting any error at all!

  • Following are the contents of my config file!

# Configuration for Rasa NLU.
# https://rasa.com/docs/rasa/nlu/components/
language: en

# # No configuration for the NLU pipeline was provided. The following default pipeline was used to train your model.
# # If you'd like to customize it, uncomment and adjust the pipeline.
# # See https://rasa.com/docs/rasa/tuning-your-model for more information.
   - name: SpacyNLP
   - name: SpacyTokenizer
   - name: RegexFeaturizer
     case_sensitive: False
     use_word_boundaries: True
   - name: LexicalSyntacticFeaturizer
   - name: SpacyFeaturizer
   - name: DIETClassifier
     entity_recognition: True
     epochs: 100
     constrain_similarities: true
#   - name: filterEmail.FilterEmail
   - name: ResponseSelector
     epochs: 100
     constrain_similarities: true
   - name: FallbackClassifier
     threshold: 0.4
     ambiguity_threshold: 0.1

# Configuration for Rasa Core.
# https://rasa.com/docs/rasa/core/policies/
# # No configuration for policies was provided. The following default policies were used to train your model.
# # If you'd like to customize them, uncomment and adjust the policies.
# # See https://rasa.com/docs/rasa/policies for more information.
   - name: MemoizationPolicy
   - name: TEDPolicy
     max_history: 5
     epochs: 100
     constrain_similarities: true
   - name: RulePolicy

#- name: "importers.dataimporter.DataImporter"
- name: "RasaFileImporter"