Rasa test is predicting differently than shell?

I’m having a problem writing tests for rasa 3.4.1. This is a new test, and didn’t exist before I upgraded.

When I train and run rasa shell I can run through a scenario and it works as I expect.

When I run rasa test my test story fails for that scenario and is predicting the fallback scenario. Am I doing something wrong? All my code is below with a comment afterwards.

Has anyone else had an issue with testing?

Here’s the original story with the checkpoint stories:

  - story: Get New ticket counts
    steps:
      - intent: crm_get_all_by_status
      - action: get_corporate_id_form
      - active_loop: get_corporate_id_form
      - action: action_get_all_tickets_by_status
      - action: utter_ask_to_view_tickets
      - checkpoint: show_or_not_show_tickets

Here’s the checkpoint it calls

  - story: No, do not show Tickets
    steps:
      - checkpoint: show_or_not_show_tickets
      - intent: deny
      - action: utter_ask_anything_else
      - checkpoint: ask_anything_else

Here’s the checkpoint that calls:

  - story: Ask if there is anything else - No
    steps:
      - checkpoint: ask_anything_else
      - intent: deny
      - action: action_save_event_chat_initiated
      - action: utter_thanks_goodbye

My Test Story:


- story: TEST Get and display number of tickets User says No
  steps:
  - user: |
      How many tickets do I have?
  - intent: crm_get_all_by_status
  - action: get_corporate_id_form
  - active_loop: get_corporate_id_form
  - active_loop: null
  - action: action_get_all_tickets_by_status
  - action: utter_tell_number_of_tickets
  - action: utter_ask_to_view_tickets
  - user: |
      No
  - checkpoint: show_or_not_show_tickets
  - intent: deny
  - action: utter_ask_anything_else
  - checkpoint: ask_anything_else
  - user: |
      No thanks
  - intent: deny
  - action: utter_thanks_goodbye

And finally here is the failed_test_stories.yml -

version: "3.1"
stories:
- story: TEST Get and display number of tickets User says No (/home/jwheat/Code/NearlyHuman/rasa/rasa-demo/tests/test_stories.yml)
  steps:
  - user: |-
      How many tickets do I have?
  - action: action_listen  # predicted: action_default_fallback
  - intent: crm_get_all_by_status
  - action: get_corporate_id_form
  - active_loop: get_corporate_id_form
  - active_loop: null
  - action: action_get_all_tickets_by_status  # predicted: utter_tell_inform_thank_you
  - action: utter_tell_number_of_tickets  # predicted: utter_ask_to_view_tickets
  - action: utter_ask_to_view_tickets  # predicted: action_save_event_live_agent_chat_failed
  - user: |-
      No thanks
  - action: action_listen  # predicted: action_default_fallback
  - intent: deny
  - action: utter_ask_anything_else  # predicted: action_save_event_chat_initiated
  - user: |-
      No thanks
  - action: action_listen  # predicted: action_default_fallback
  - intent: deny
  - action: utter_thanks_goodbye  # predicted: action_save_event_chat_initiated

You can see right after the user line it has added an extra - action: action_listen yet comments as it with # predicted: action_default_fallback

Rasa test and shell prediction differences could be due to several factors, including differences in the training data, models, or configurations used. It is important to check the input and context in both Rasa test and shell to ensure that the same information is being used for prediction. Additionally, checking the versions of Rasa and other dependencies being used could also be helpful in identifying the source of the discrepancy. If the issue persists, it may be necessary to further debug and fine-tune the models to improve their accuracy.

Hello there,

Facing the same problem here. Tested on both Rasa 3.1.0 and 3.6.14. Exactly the same training data, domain and config files, if that matters somehow. First I ran rasa test

rasa test -m models/my_model.tar.gz

and several stories failed at the action prediction level. Then, I tried running shell (and actions server, although this doesn’t affect the stories in question)

rasa shell -m models/my_model.tar.gz

I can see that while the same model is loaded, the predictions are different!

Then I tried running rasa interactive

rasa interactive -m models/my_model.tar.gz

and to my surprise, rasa interactive predictions were aligned with the test results. What could be going wrong here?

I’ll create a minimal working example repo and post it here next week.

Happy new years celebrations everyone!