Rules in rasa test


So I have two rules for chitchat and faq. I wanted to ask if rules are taken into account in rasa e2e tests ‘rasa test’. For some reason even rule like this:

- rule: Greet user
    - intent: greet
    - action: utter_greet

and test like this:

- story: say hello
  - user: |
      Dzień dobry
    intent: greet
  - action: utter_greet

is not working. As the test output I get ‘Correct: 0/1’.

Furthermore, in ‘results/failed_test_stories.yml’ I get:

version: "2.0"
- story: ask faq question (/tmp/tmpg1nxmxuf/46c3c6b036a6448591cdd63f6308316e_test_conversations.yml)
  - intent: greet  # predicted: greet: [Dzień]{"entity": "duration", "value": "1"} dobry
  - action: utter_greet

What is wrong here?

Hey @BarMin,

Rules are taken into account. In the example you posted, what the lines in failed_test_stories.yml are telling you is that utter_greet got predicted correctly (in other words, rules get used as they should), but there was an NLU mistake for the user utterance (where the NLU model wrongly extracted an entity from the text). That’s why overall there’s 0/1 stories predicted correctly, even though all the actions were predicted correctly.

You may likely need to provide more or better NLU training data, then the story would hopefully be handled correctly as a whole, without any NLU mistakes.

Well… I get it. But that’s kind of funny, because duckling extracts “Dzień” (“Day” in polish) as entity “duration”. I didn’t really took this into account in my NLU data, but I need to use duckling HTTP service in my bot.

Ah, I see. Well, there’s nothing we can do about Duckling’s behaviour here… You can disable the duration dimension for Duckling in your config but if you actually want to extract duration elsewhere in your NLU data, then I guess your best bet is to ignore the mentioned NLU mistake. Results of rasa test give you also action-level scores (not just story-level ones), so despite these NLU mistakes you can still get reliable numbers for your action prediction.

1 Like

Yes. That is correct and it works well ( Evaluation Results on ACTION level: 12/12).

Okey. Thank you for you help :slight_smile: Problem “solved”, although I think this is kind of a bug. I don’t think it should work like this and there should be possiblity to configure it somehow (in tests).

1 Like