TED predicts the wrong action, and Rasa tests do not detect this error

Hi, I have a problem: Rasa tests do not detect this TED prediction errors. How do I set up tests?

Example of story:

- story: give_me_stock
  steps:
    - or:
        - intent: data_car
        - intent: give_me_stock
    - action: action_send_greet
    - action: utter_give_me_stock
    - action: action_find_car
    - action: action_send_car_status
    - action: action_get_car_link
    - action: action_text_follow_up/like_obj
    - action: utter_question_like_obj
    - intent: confirm
    - action: action_switcher

Test case

stories:
- story: check_stock 1
  steps:
  - user: |
      looking for [toyota](brand) [rav4](model)
    intent: give_me_stock
  - action: action_send_greet
  - action: utter_give_me_stock
  - action: action_find_car
  - action: action_send_car_status
  - action: action_get_car_link
  - action: action_text_follow_up/like_obj
  - action: utter_question_like_obj
  - user: |
      Yes
    intent: confirm
  - action: action_switcher
  - action: action_listen

Run tests with command: rasa test -m 20240527-183301-approximate-fail.tar.gz

Result: All is correct, no mistakes… # None of the test stories failed - all good! and # No warnings for test stories

But in rasa shell or rasa run we have wrong prediction:

[state 1] user intent: give_me_stock | user entities: ('brand', 'model') | previous action name: action_listen | slots: {'search_finish': (1.0, 0.0)}
[state 2] user intent: give_me_stock | user entities: ('brand', 'model') | previous action name: action_send_greet | slots: {'search_finish': (1.0, 0.0)}
[state 3] user intent: give_me_stock | user entities: ('brand', 'model') | previous action name: utter_give_me_stock | slots: {'search_finish': (1.0, 0.0)}
[state 4] user intent: give_me_stock | user entities: ('brand', 'model') | previous action name: action_find_car | slots: {'system_car_id': (1.0,), 'search_finish': (1.0, 0.0)}
[state 5] user intent: give_me_stock | user entities: ('brand', 'model') | previous action name: action_send_car_status | slots: {'system_car_id': (1.0,), 'search_finish': (1.0, 0.0)}
[state 6] user intent: give_me_stock | user entities: ('brand', 'model') | previous action name: action_get_car_link | slots: {'system_car_id': (1.0,), 'search_finish': (1.0, 0.0)}
[state 7] user intent: give_me_stock | user entities: ('brand', 'model') | previous action name: action_text_follow_up/like_obj | slots: {'system_car_id': (1.0,), 'search_finish': (1.0, 0.0)}
[state 8] user intent: give_me_stock | user entities: ('brand', 'model') | previous action name: utter_question_like_obj | slots: {'system_car_id': (1.0,), 'search_finish': (1.0, 0.0)}
[state 9] user intent: confirm | previous action name: action_listen | slots: {'system_car_id': (1.0,), 'search_finish': (1.0, 0.0)}
[state 10] user intent: confirm | previous action name: utter_question_deal_fi_method-credit_convert_call | slots: {'system_car_id': (1.0,), 'search_finish': (1.0, 0.0)}
[state 11] user intent: confirm | previous action name: action_text_follow_up/deal_convert_call | slots: {'system_car_id': (1.0,), 'search_finish': (1.0, 0.0)}

This model work wrong, I supposed that tests can help me to catch models with mistake. But tests didn’t see it.

Wrong actions:

utter_question_deal_fi_method-credit_convert_call
action_text_follow_up/deal_convert_call

Right action:

action_switcher

Here is my settings

   - name: MemoizationPolicy
   - name: RulePolicy
     core_fallback_threshold: 0.3
     core_fallback_action_name: "action_default_fallback"
     enable_fallback_prediction: false
   - name: UnexpecTEDIntentPolicy
     max_history: 20
     epochs: 100
   - name: TEDPolicy
     max_history: 20
     epochs: 250
     constrain_similarities: true

I noticed:

rasa.core.policies.ted_policy  - TED predicted 'utter_question_deal_fi_method-credit_convert_call' based on user intent.
rasa.core.processor  - Predicted next action 'utter_question_deal_fi_method-credit_convert_call' with confidence 0.06.

Middle confidence is 0.85

How do I set up tests to catch this mistakes?

Thanks :slight_smile: