Inquiry on Story writing

Hello,

I’m building a bot which worked quite fine on the start however when the stories grew to about 165 I get cases of wrong predictions at random places. I wrote tests and sometimes I have all tests passing i.e 18/18 other times 13/18 etc. Basically the test outcome are not stable. My story writing strategy is as below

  - story: start story upto ask for help branch
    steps:
    - or:
      - intent: greet
      - intent: start
    - slot_was_set: 
      - completion_status: in_complete
    - action: name_form
    - active_loop: name_form
    - active_loop: null
    - action: utter_pls_to_meet
    - action: utter_therapy
    - action: utter_privacy_and_confidentiality
    - action: utter_i_will_always_be_here_to_help

- story: user selects sounds great and does not accept privacy policy
    steps:
    - action: utter_i_will_always_be_here_to_help
    - intent: answer
      entities:
      - confirm_option
    - slot_was_set:
      - confirm_option: positive-sel
    - action: priv_policy_form
    - active_loop: priv_policy_form
    - active_loop: null
    - slot_was_set:
      - priv_policy: not-accepted
    - action: utter_did_not_accept

 - story: lets carry on option
    steps:
    - action: utter_i_will_always_be_here_to_help
    - intent: answer
      entities:
      - confirm_option
    - slot_was_set:
      - confirm_option: negative-sel
    - action: utter_end_conversation

Basically due to the numerous number of turns/paths I use an action to join stories where the stories split based on button selection. I then use a categorical slot with influence conversation set to true to differentiate the stories. I have replicated use of the same intent and slot across the entire set of stories. Could this be the issue causing tests to be inconsistent ?

Below is my config.yml

language: en
pipeline:
  - name: WhitespaceTokenizer
  - name: RegexFeaturizer
  - name: LexicalSyntacticFeaturizer
  - name: CountVectorsFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: "char_wb"
    min_ngram: 1
    max_ngram: 4
  - name: DIETClassifier
    epochs: 100
  - name: EntitySynonymMapper
  - name: ResponseSelector
    epochs: 100
  - name: FallbackClassifier
    threshold: 0.7

  - name: AugmentedMemoizationPolicy
  - name: TEDPolicy
    max_history: 5
    epochs: 100
    evaluate_on_number_of_examples: 0
    evaluate_every_number_of_epochs: 2
    tensorboard_log_directory: "./tensorboard"
    tensorboard_log_level: "epoch"
  - name: RulePolicy

Rasa version:

Rasa Version     : 2.2.8
Rasa SDK Version : 2.2.0
Rasa X Version   : None
Python Version   : 3.8.0
Operating System : macOS-10.15.1-x86_64-i386-64bit
Python Path      : /Users/user/Data/bnbrproject/venv/bin/python3
1 Like

@akelad kindly assist.

Or should I use checkpoints to join stories rather than actions.

hi @Ian - when you train your model, rasa should be outputting training diagnostics on the command line. What is the training accuracy your model achieves at the end of 100 epochs? (I see you also have tensorboard logs set up, so you can also check tensorboard for this number).

Regarding the test stories, are these identical to things you have covered in your training data? If so, you should at least be able to train your model to 100% training data accuracy and also consistently get 18/18 test stories right.

hey @amn41

[14:53<00:00, 8.93s/it, t_loss=0.584, loss=0.068, acc=0.999]

Above are my results after training. I however dont get 18/18 tests pass consistently yet the test have been covered in my training data.

looks as though there is some behaviour in those stories which is extremely tricky to learn.

  • the 5 test cases which sometimes fail. Are these consistently the same ones? Do they all fail at the same ‘branch’ point?

  • what happens if you increase max_history?

  • do you have augmentation enabled? (default is yes, you can switch off with --augmentation 0 )

Hey @amn41

Here are my findings on joining stories using actions where the two stories are differentiated using a slot was set as shown above, It seems rasa uses Ted Policy to predict the next action.On using Ted Policy rasa makes wrong prediction at times which are random as observed from the test results. Is it ok for rasa to use Ted Policy yet the stories are already defined ? should it use Memoization or AugmentedMemoizationPolicy in this case.

If I join the same stories using checkpoints AugmentedMemiozation policy is used to make the next prediction which is quite accurate. However the use of checkpoints increases the time used to process story blocks which is quite costly.

you are totally right that if the stories are already defined, Rasa should be using the memoization policy, not TED. If you are able to share a minimal reproducible example, we can hunt down the root cause