How to prioritise annotation of training examples?

How do you determine how to prioritise annotating in nlu.yml the entities (plus variants and plurals of entities with object_types), and attributes that are associated with an intent for best bot accuracy?

So for example, if i had an intent example:

  • "uber organisation creates stolen alert for vulnerable account"

And where i had a an NLU data .json file with contents below, where entities types included: “organisation”, “alert”, and “account”; and where the value of the “name” attribute of one of the "alert"s was “stolen alert”; and where the value of the “name” attribute of one of the "organisation"s was “uber organisation”; and where the value of the “name” attribute of one of the "account"s was “vulnerable account”, and where “vulnerable_account” was also an attribute of each “account” entity, and where “stolen_alert” was also an attribute of each “account” (allowing an account to toggle activation of stolen alerts)

Given the following NLU data:

{
	"organisation": [
		{
			"id": 0,
			"name": "uber organisation"
		}
	},
	"alert": [
		{
			"id": 0,
			"name": "stolen alert"
		}
	],
	"account": [
		{
			"id": 0,
			"name": "vulnerable account",
			"vulnerable_account": true,
			"stolen_alert": false
		}
	]
}

Which of the following approaches to annotating do you think would result in the most accurate bot performance?

  1. "uber [organisation](organisation) creates stolen [alert](alert) for vulnerable [account](account)"

  2. "[uber organisation]({"entity": "object_type", "value": "organisation"}) creates [stolen alert]({"entity": "object_type", "value": "alert"}) for [vulnerable account]({"entity": "object_type", "value": "account"})"

  3. "[uber]({"entity": "object_type", "value": "organisation"}) [organisation](organisation) creates [stolen alert]({"entity": "attribute", "value": "stolen_alert"}) for [vulnerable account]({"entity": "attribute", "value": "vulnerable_account"})"

Also, when would you use syntax [uber organisation]{"entity": "organisation"} rather than syntax [uber organisation]{"entity": "object_type", "value": "organisation"}?

I can’t find where it mentions how to prioritise this in the documentation. i would expect it would perform better if you prioritised annotating variations of entities like [uber organisation]({"entity": "object_type", "value": "organisation"}) over just annotating an entity [organisation](organisation) since its more specific information, whereas annotating it as [uber]({"entity": "object_type", "value": "organisation"}) [organisation](organisation) may introduce too much ambiguity and confusion to the bot as you scale you might also end up using [uber]({"entity": "object_type", "value": "account"}), just like you wouldn’t add just uber to a regex.

It’s also not clear from the intent example whether we are creating a stolen alert for the “vulnerable account” (the one with the “name” attribute value of “vulnerable account”) or whether we are creating a stolen alert for “vulnerable_account” (all accounts that have the “vulnerable_account” attribute set to true), so perhaps more than one intent example with different wording are necessary to reduce the confusion:

  • ... for the [vulnerable account]({"entity": "object_type", "value": "account"})"
  • ... for a [vulnerable account]({"entity": "attribute", "value": "vulnerable_account"})

In the documentation here Knowledge Base Actions it just mentions how they use “synonyms” extensively to map variations of entities and attributes, obviously in combination with “regex”.