I am building an FAQ response selector model with about 200 intents. The bot answers questions regarding medication usage, general information about the medication, and the general abilities of the chatbot.
I developed two pipelines:
Spacy
pipeline:
# # No configuration for the NLU pipeline was provided. The following default pipeline was used to train your model.
# # If you'd like to customize it, uncomment and adjust the pipeline.
# # See https://rasa.com/docs/rasa/tuning-your-model for more information.
- name: SpacyNLP
model: en_core_web_md
- name: SpacyTokenizer
- name: SpacyFeaturizer
- name: RegexFeaturizer
# - name: RegexEntityExtractor #uncomment when model done -- very slow while training
# - name: LanguageModelFeaturizer
# model_name: "xlnet"
# model_weights: "xlnet-base-cased"
cache_dir: null
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer
analyzer: char_wb
min_ngram: 1
max_ngram: 4
- name: "DucklingEntityExtractor"
# url of the running duckling server
url: "http://localhost:8000"
# dimensions to extract
dimensions: ["time","email","number"]
# allows you to configure the locale, by default the language is
# used
#locale: "de_DE"
# if not set the default timezone of Duckling is going to be used
# needed to calculate dates from relative expressions like "tomorrow"
timezone: "America/New_York"
# Timeout for receiving response from http url of the running duckling server
# if not set the default timeout of duckling http url is set to 3 seconds.
timeout : 20
- name: DIETClassifier
epochs: 100
- name: EntitySynonymMapper
- name: ResponseSelector
epochs: 100
retrieval_intent: faq
- name: FallbackClassifier
threshold: 0.7
ambiguity_threshold: 0.1
policies:
- name: AugmentedMemoizationPolicy
- name: TEDPolicy
epochs: 40
- name: RulePolicy
No Spacy - baseline
recipe: default.v1
language: en
pipeline:
# # No configuration for the NLU pipeline was provided. The following default pipeline was used to train your model.
# # If you'd like to customize it, uncomment and adjust the pipeline.
# # See https://rasa.com/docs/rasa/tuning-your-model for more information.
- name: WhitespaceTokenizer
- name: RegexFeaturizer
# - name: RegexEntityExtractor #uncomment when model done -- very slow while training
# - name: LanguageModelFeaturizer
# model_name: "xlnet"
# model_weights: "xlnet-base-cased"
# cache_dir: null
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer
analyzer: char_wb
min_ngram: 1
max_ngram: 4
- name: "DucklingEntityExtractor"
# url of the running duckling server
url: "http://localhost:8000"
# dimensions to extract
dimensions: ["time","email","number"]
# allows you to configure the locale, by default the language is
# used
#locale: "de_DE"
# if not set the default timezone of Duckling is going to be used
# needed to calculate dates from relative expressions like "tomorrow"
timezone: "America/New_York"
# Timeout for receiving response from http url of the running duckling server
# if not set the default timeout of duckling http url is set to 3 seconds.
timeout : 20
- name: DIETClassifier
epochs: 100
- name: EntitySynonymMapper
- name: ResponseSelector
epochs: 100
retrieval_intent: faq
- name: FallbackClassifier
threshold: 0.7
ambiguity_threshold: 0.1
policies:
- name: AugmentedMemoizationPolicy
- name: TEDPolicy
epochs: 40
- name: RulePolicy
The difference between the two is that the first one incorporates Spacy pre-embeddings into the model, however, I noticed the model does a poor job at classifying the correct intent.
For example, I have an intent faq/googling
where all the training examples relate to google. When I try to ask “what is the difference between you and googling” the bot responds with the incorrect response. However, when I remove spacy, the response is correct.
Why is the model performing worse with Spacy added? It doesn’t make sense to me. I only have one intent about googling and all the training examples under this intent vary enough to help generalize but it’s still related to the googling.
Also I observed the confidence ranking output, the confidence for each intent is near 0.
What does this all mean?
"ranking": [
{
"confidence": 0.02295936644077301,
"intent_response_key": "faq/main_ingredient"
},
{
"confidence": 0.021990343928337097,
"intent_response_key": "faq/will_cravings_return"
},
{
"confidence": 0.02169930748641491,
"intent_response_key": "faq/side_effects"