Questions regarding data validation script and with slots/entites and followups from custom actions

Hi,

Ran into an interesting issue when trying to run story validation:

I have 2 stories using a boolean slot:

## COVID-19 | pregnancy | symptomatic delivery
* labor_and_delivery{"symptomatic": "true"}
  - utter_symptomatic_labor_and_delivery_1
  - utter_symptomatic_labor_and_delivery_2

## COVID-19 | pregnancy | asymptomatic delivery
* labor_and_delivery{"symptomatic": "false"}
  - utter_asymptomatic_labor_and_delivery_1
  - utter_asymptomatic_labor_and_delivery_2

When I run rasa data validate I get the following message:

2020-09-04 20:29:17 WARNING  rasa.validator  - Story structure conflict after intent 'labor_and_delivery':
  utter_asymptomatic_labor_and_delivery_1 predicted in 'COVID-19 | pregnancy | asymptomatic delivery'
  utter_symptomatic_labor_and_delivery_1 predicted in 'COVID-19 | pregnancy | symptomatic delivery'

This is the relevant parts of the domain:

intents:
  - labor_and_delivery

entities:
  - symptomatic

slots:
  symptomatic:
    type: bool

Does the validation script account for slots/entities within stories?

Just as well, it seems that queuing up followup actions via a custom slot (and recording that with a - followup{"name": "my_custom_action"} event in the story seems to cause the same issue. Those stories look like:

## Using a custom action to queue up action A
* some_intent
  - utter_some_initial_response
  - check_a_slot
  - followup{"name": "action_a"}
  - action_a

## Using a custom action to queue up action B
* some_intent
  - utter_some_initial_response
  - check_a_slot
  - followup{"name": "action_b"}
  - action_b

and the error message:

2020-09-04 20:39:36 WARNING  rasa.validator  - Story structure conflict after action 'check_a_slot':
  action_a predicted in 'Using a custom action to queue up action A'
  action_b predicted in 'Using a custom action to queue up action B'

To be clear, the stories work fine during runtime, but they’re causing actual story conflicts from being readily apparent from the logs. Perhaps this is something we can add in?

Hello @niveK

The story validation should take into account anything that the Rasa Core policy would take into account, and this includes the features of the slots. Thus, the conflict that it’s showing you in your first example might be a bug. :thinking: The second example is to be expected, because two different actions follow from the same state, and Rasa Core cannot learn that - which is why we generally discourage people from using followup actions now.

What is your max_history? I guess you don’t use auto_fill: False for the slot? What Rasa version are you using (type rasa --version)?

This is my Rasa CLI version:

Rasa Version     : 2.0.0a4
Rasa SDK Version : 2.0.0a4
Rasa X Version   : None
Python Version   : 3.8.2 (default, Jul 16 2020, 14:00:26) 
Operating System : Linux-5.4.0-45-generic-x86_64-with-glibc2.29
Python Path      : /usr/bin/python3

However the project data is from a project running on 1.10.x as I demonstrated.

As I showed in the domain, I don’t have any additional configuration settings for that slot and I know that rasa data validate with no additional options will run at the longest history it finds in the data. I actually separated those out to a new test project by copying over the files into a fresh rasa init project just to make sure it wasn’t something else causing it.

Is there a recommended way to implement a custom action with many different output paths that would lead back to similar stories? A lot of my bot is powered by conditional messaging based on the user’s profile, so what I’ve done is set up custom actions to query the user’s profile, provide personalized response utterances and queue up the one of the branching results from that query. However, many of these branches have a common trunk, so to speak, so I’ve mostly been using FollowupActions to properly account for that in the training data. What’s the best way to featurize this sort of behavior?

1 Like

Ok, I can reproduce the problem now. It seems to be an issue with the boolean slots. For now, here is a workaround: Set the entity type to “categorical” with the two values “true” and “false”. I’ll look into why it doesn’t work with type “bool” and create an issue if necessary. Thanks for pointing this out!

To encode a user profile, I would (as you do) call a custom action to fetch the relevant information. But instead of then triggering the different conversation paths with followup actions, I’d put that information into slots and let Rasa Core learn what to do, given those slot settings. If some of that information is complicated and needs a custom featurization, you can write custom slot types, as described here: Slots

While this requires you to collect more example stories (at least one for each path), it scales better because Rasa can now learn all the exceptions and conversation patterns from the stories.

Ok, I found the problem and created a PR to fix it.

1 Like

@niveK It looks like writing true (without quotes) also works.

Ah that totally makes sense, since it would be JSON!

Actually, just changed this and I got the following error:

2020-09-11T22:18:16.8121298Z Traceback (most recent call last):
2020-09-11T22:18:16.8122850Z   File "<string>", line 1, in <module>
2020-09-11T22:18:16.8123408Z   File "/home/runner/work/venus/venus/venus/data/validator.py", line 27, in main
2020-09-11T22:18:16.8123890Z     all_good = loop.run_until_complete(validate_data())
2020-09-11T22:18:16.8124431Z   File "uvloop/loop.pyx", line 1456, in uvloop.loop.Loop.run_until_complete
2020-09-11T22:18:16.8125042Z   File "/home/runner/work/venus/venus/venus/data/validator.py", line 16, in validate_data
2020-09-11T22:18:16.8125573Z     validator = await Validator.from_importer(importer)
2020-09-11T22:18:16.8126862Z   File "/opt/hostedtoolcache/Python/3.6.12/x64/lib/python3.6/site-packages/rasa/validator.py", line 48, in from_importer
2020-09-11T22:18:16.8127461Z     intents = await importer.get_nlu_data()
2020-09-11T22:18:16.8128307Z   File "/opt/hostedtoolcache/Python/3.6.12/x64/lib/python3.6/site-packages/rasa/importers/importer.py", line 273, in get_nlu_data
2020-09-11T22:18:16.8128924Z     nlu_data = await asyncio.gather(*nlu_data)
2020-09-11T22:18:16.8129773Z   File "/opt/hostedtoolcache/Python/3.6.12/x64/lib/python3.6/site-packages/rasa/importers/multi_project.py", line 192, in get_nlu_data
2020-09-11T22:18:16.8130447Z     return utils.training_data_from_paths(self._nlu_paths, language)
2020-09-11T22:18:16.8131374Z   File "/opt/hostedtoolcache/Python/3.6.12/x64/lib/python3.6/site-packages/rasa/importers/utils.py", line 11, in training_data_from_paths
2020-09-11T22:18:16.8132079Z     training_data_sets = [loading.load_data(nlu_file, language) for nlu_file in paths]
2020-09-11T22:18:16.8133349Z   File "/opt/hostedtoolcache/Python/3.6.12/x64/lib/python3.6/site-packages/rasa/importers/utils.py", line 11, in <listcomp>
2020-09-11T22:18:16.8134030Z     training_data_sets = [loading.load_data(nlu_file, language) for nlu_file in paths]
2020-09-11T22:18:16.8134948Z   File "/opt/hostedtoolcache/Python/3.6.12/x64/lib/python3.6/site-packages/rasa/nlu/training_data/loading.py", line 66, in load_data
2020-09-11T22:18:16.8135670Z     data_sets = [_load(f, language) for f in files]
2020-09-11T22:18:16.8136528Z   File "/opt/hostedtoolcache/Python/3.6.12/x64/lib/python3.6/site-packages/rasa/nlu/training_data/loading.py", line 66, in <listcomp>
2020-09-11T22:18:16.8137122Z     data_sets = [_load(f, language) for f in files]
2020-09-11T22:18:16.8137919Z   File "/opt/hostedtoolcache/Python/3.6.12/x64/lib/python3.6/site-packages/rasa/nlu/training_data/loading.py", line 137, in _load
2020-09-11T22:18:16.8138592Z     return reader.read(filename, language=language, fformat=fformat)
2020-09-11T22:18:16.8139554Z   File "/opt/hostedtoolcache/Python/3.6.12/x64/lib/python3.6/site-packages/rasa/nlu/training_data/formats/readerwriter.py", line 33, in read
2020-09-11T22:18:16.8140236Z     return self.reads(io_utils.read_file(filename), **kwargs)
2020-09-11T22:18:16.8141122Z   File "/opt/hostedtoolcache/Python/3.6.12/x64/lib/python3.6/site-packages/rasa/nlu/training_data/formats/markdown.py", line 63, in reads
2020-09-11T22:18:16.8141686Z     self._parse_item(line)
2020-09-11T22:18:16.8142504Z   File "/opt/hostedtoolcache/Python/3.6.12/x64/lib/python3.6/site-packages/rasa/nlu/training_data/formats/markdown.py", line 111, in _parse_item
2020-09-11T22:18:16.8143100Z     item, self.current_title
2020-09-11T22:18:16.8143984Z   File "/opt/hostedtoolcache/Python/3.6.12/x64/lib/python3.6/site-packages/rasa/nlu/training_data/entities_parser.py", line 174, in parse_training_example
2020-09-11T22:18:16.8144676Z     entities = find_entities_in_training_example(example)
2020-09-11T22:18:16.8145817Z   File "/opt/hostedtoolcache/Python/3.6.12/x64/lib/python3.6/site-packages/rasa/nlu/training_data/entities_parser.py", line 52, in find_entities_in_training_example
2020-09-11T22:18:16.8146528Z     entity_attributes = extract_entity_attributes(match)
2020-09-11T22:18:16.8147686Z   File "/opt/hostedtoolcache/Python/3.6.12/x64/lib/python3.6/site-packages/rasa/nlu/training_data/entities_parser.py", line 84, in extract_entity_attributes
2020-09-11T22:18:16.8148414Z     return extract_entity_attributes_from_dict(entity_text, match)
2020-09-11T22:18:16.8149682Z   File "/opt/hostedtoolcache/Python/3.6.12/x64/lib/python3.6/site-packages/rasa/nlu/training_data/entities_parser.py", line 109, in extract_entity_attributes_from_dict
2020-09-11T22:18:16.8150382Z     entity_dict = get_validated_dict(entity_dict_str)
2020-09-11T22:18:16.8151642Z   File "/opt/hostedtoolcache/Python/3.6.12/x64/lib/python3.6/site-packages/rasa/nlu/training_data/entities_parser.py", line 151, in get_validated_dict
2020-09-11T22:18:16.8152650Z     validation_utils.validate_training_data(data, schema.entity_dict_schema())
2020-09-11T22:18:16.8154070Z   File "/opt/hostedtoolcache/Python/3.6.12/x64/lib/python3.6/site-packages/rasa/utils/validation.py", line 105, in validate_training_data
2020-09-11T22:18:16.8154831Z     raise e
2020-09-11T22:18:16.8155659Z   File "/opt/hostedtoolcache/Python/3.6.12/x64/lib/python3.6/site-packages/rasa/utils/validation.py", line 98, in validate_training_data
2020-09-11T22:18:16.8156481Z     validate(json_data, schema)
2020-09-11T22:18:16.8157752Z   File "/opt/hostedtoolcache/Python/3.6.12/x64/lib/python3.6/site-packages/jsonschema/validators.py", line 934, in validate
2020-09-11T22:18:16.8158379Z     raise error
2020-09-11T22:18:16.8160386Z jsonschema.exceptions.ValidationError: True is not of type 'string'. Failed to validate data, make sure your data is valid. For more information about the format visit https://rasa.com/docs/rasa/nlu/training-data-format/.
2020-09-11T22:18:16.8161801Z 
2020-09-11T22:18:16.8162526Z Failed validating 'type' in schema['properties']['value']:
2020-09-11T22:18:16.8163344Z     {'type': 'string'}
2020-09-11T22:18:16.8163510Z 
2020-09-11T22:18:16.8164004Z On instance['value']:
2020-09-11T22:18:16.8164463Z     True

:man_facepalming: I’ll look into this. Should be able to merge my PR soon, so at least “true” would be working on master.