How do you determine how to prioritise annotating in nlu.yml the entities (plus variants and plurals of entities with object_types
), and attributes
that are associated with an intent for best bot accuracy?
So for example, if i had an intent example:
"uber organisation creates stolen alert for vulnerable account"
And where i had a an NLU data .json file with contents below, where entities types included: “organisation”, “alert”, and “account”; and where the value of the “name” attribute of one of the "alert"s was “stolen alert”; and where the value of the “name” attribute of one of the "organisation"s was “uber organisation”; and where the value of the “name” attribute of one of the "account"s was “vulnerable account”, and where “vulnerable_account” was also an attribute of each “account” entity, and where “stolen_alert” was also an attribute of each “account” (allowing an account to toggle activation of stolen alerts)
Given the following NLU data:
{
"organisation": [
{
"id": 0,
"name": "uber organisation"
}
},
"alert": [
{
"id": 0,
"name": "stolen alert"
}
],
"account": [
{
"id": 0,
"name": "vulnerable account",
"vulnerable_account": true,
"stolen_alert": false
}
]
}
Which of the following approaches to annotating do you think would result in the most accurate bot performance?
-
"uber [organisation](organisation) creates stolen [alert](alert) for vulnerable [account](account)"
-
"[uber organisation]({"entity": "object_type", "value": "organisation"}) creates [stolen alert]({"entity": "object_type", "value": "alert"}) for [vulnerable account]({"entity": "object_type", "value": "account"})"
-
"[uber]({"entity": "object_type", "value": "organisation"}) [organisation](organisation) creates [stolen alert]({"entity": "attribute", "value": "stolen_alert"}) for [vulnerable account]({"entity": "attribute", "value": "vulnerable_account"})"
Also, when would you use syntax [uber organisation]{"entity": "organisation"}
rather than syntax [uber organisation]{"entity": "object_type", "value": "organisation"}
?
I can’t find where it mentions how to prioritise this in the documentation. i would expect it would perform better if you prioritised annotating variations of entities like [uber organisation]({"entity": "object_type", "value": "organisation"})
over just annotating an entity [organisation](organisation)
since its more specific information, whereas annotating it as [uber]({"entity": "object_type", "value": "organisation"}) [organisation](organisation)
may introduce too much ambiguity and confusion to the bot as you scale you might also end up using [uber]({"entity": "object_type", "value": "account"})
, just like you wouldn’t add just uber
to a regex.
It’s also not clear from the intent example whether we are creating a stolen alert for the “vulnerable account” (the one with the “name” attribute value of “vulnerable account”) or whether we are creating a stolen alert for “vulnerable_account” (all accounts that have the “vulnerable_account” attribute set to true
), so perhaps more than one intent example with different wording are necessary to reduce the confusion:
... for the [vulnerable account]({"entity": "object_type", "value": "account"})"
... for a [vulnerable account]({"entity": "attribute", "value": "vulnerable_account"})
In the documentation here Knowledge Base Actions it just mentions how they use “synonyms” extensively to map variations of entities and attributes, obviously in combination with “regex”.