Hi,
I’m currently on Rasa 1.10.20, will be upgrading to 2.0 soon
Made the switch over to the TED policy and I’m noticing that the predictions are not really following the stories all that well. A lot of my utterances use very similar vocabulary, so my hunch is that could be causing the incorrect predictions.
This is my current config:
config.yml
language: en
importers:
- name: MultiProjectImporter
imports:
- projects/Alternative Medicine
- projects/Appt Followup
- projects/Appt Prep
- projects/Basic
- projects/Best Treatment Quiz
- projects/Conception
- projects/Conditions
- projects/Donor Eggs or Sperm
- projects/Embryo Grading
- projects/Endometriosis
- projects/Entryway - Actively Trying
- projects/Entryway - Exploring Treatment
- projects/Entryway - Fertility Preservation
- projects/Entryway - Intro Utterances
- projects/Entryway - Preconception
- projects/Entryway - Undergoing Treatment
- projects/Exercise
- projects/Fertility Preservation
- projects/General Fertility Treatment
- projects/General Infertility Info
- projects/Genetic Testing
- projects/IUI
- projects/IVF
- projects/Male Fertility
- projects/Medications
- projects/Mental Health
- projects/Nutrition
- projects/OI
- projects/Other or Outside Current Scope
- projects/PCOS
- projects/Symptoms
- projects/Testing
- projects/Treatment Cost & Insurance
pipeline:
- name: WhitespaceTokenizer
- name: RegexFeaturizer
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer
analyzer: char_wb
min_ngram: 2
max_ngram: 5
- name: DIETClassifier
epochs: 300
BILOU_flag: true
use_masked_language_model: true
number_of_transformer_layers: 4
embedding_dimension: 50
hidden_layers_sizes:
text: [256, 128]
label: [256, 128]
tensorboard_log_directory: logs
tensorboard_log_level: epoch
- name: EntitySynonymMapper
- name: DucklingHTTPExtractor
url: http://duckling:8000
dimensions:
- distance
- number
- time
locale: en_US
timezone: America/New_York
policies:
- name: TwoStageFallbackPolicy
nlu_threshold: 0.5
ambiguity_threshold: 0.01
core_threshold: 0.1
fallback_core_action_name: action_default_fallback
fallback_nlu_action_name: flag_conversation_for_review
deny_suggestion_intent_name: incorrect
- name: AugmentedMemoizationPolicy
max_history: 5
- name: MappingPolicy
- name: FormPolicy
- name: TEDPolicy
epochs: 150
max_history: 5
batch_size: [64, 128]
random_seed: 6586
tensorboard_log_directory: logs
tensorboard_log_level: epoch
What I’m guessing is the culprit is the LabelTokenizerSingleStateFeaturizer
and the way it featurizes the action labels and action text.
Is there a way to improve prediction and reduce the amount of featurization that happens on the action label/action text side?
I saw in the new docs that there’s an option for dense_dimension
where I might be able to tune-down (or disable) action_text
and label_action_text
, but I’m unsure if that’s something the 1.10.20 TEDPolicy would be able to accept, since it’s not covered in the docs.