Correct way of labelling entity roles in NLU data

Hi, I just started using entity roles and groups and wanted to know what the correct way of labelling them would be? In the example below I have data for a user travelling from one location to another location and have the roles - currentlocation, destination. However, how should I label a sentence which only has the currentlocation or only has the destination? Eg.

## intent:entitychallenge

- from [UK]{"entity":"Country", "role":"currentlocation"} to [USA]{"entity":"Country", "role":"destination"}

- from [USA]{"entity":"Country", "role":"currentlocation"} to [UK]{"entity":"Country", "role":"destination"}

- to [USA]{"entity":"Country"}

- from [UK]{"entity":"Country"}

Is that correct or should it be -

## intent:entitychallenge

- from [UK]{"entity":"Country", "role":"currentlocation"} to [USA]{"entity":"Country", "role":"destination"}

- from [USA]{"entity":"Country", "role":"currentlocation"} to [UK]{"entity":"Country", "role":"destination"}

- to [USA]{"entity":"Country","role":"destination"}

- from [UK]{"entity":"Country","role":"currentlocation"}

Hi @Who,

after reading the documentation I think that both of those annotation versions are correct however I would go with version 1.

I think you need to think about what the model should learn. I would use the roles whenever there is a pair of annotated entities in the sentence such that the model learns: Whenever there is a currentlocation, there most likely also is a destination no matter of the combination / interchangeability of entities. If you say “I love the USA” then there is semantically no need to actually role-label the entity - because the idea of your sentence is another than to specify some sort of destination/current location however it is a country after all.

Maybe we should ask @Tanja for clarification here - is there any kind of best practice?

Kind regards
Julian

Hi @Who,

both versions are correct. There is no clear answer, whether to use version 1 or 2.

I agree with @JulianGerhard that you need to think about what the model should actually learn. In what kind of situation would a user just say “to USA”? Does USA in this situation still maps to the country destination or can the phrase also be used in a different context, in which USA is just a country? So, you need to think about the assistant you are building. Do currentlocation and destination always co-occur or can they also be mentioned on their own? If the user can mention them on their own, I would go with version 2, otherwise, if they always occur together, version 1 might make more sense. If a country is mentioned in a completely different context, you should not annotate it with any role label. For example, Julian mentioned that “I love the USA”, USA should just be annotated with the country label - I agree.

Hope that clarifies things :slight_smile:

Thank you for the help @JulianGerhard and @Tanja. I went with version 2 and updated my NLU data as such:

## intent:entitychallenge

- i want to go from [UK]{"entity":"Country", "role":"currentlocation"} to [USA]{"entity":"Country", "role":"destination"}

- i want to go from [USA]{"entity":"Country", "role":"currentlocation"} to [UK]{"entity":"Country", "role":"destination"}

- i want to go to [USA]{"entity":"Country", "role":"destination"}

- i want to go to [UK]{"entity":"Country", "role":"destination"}

- i want to go from [USA]{"entity":"Country", "role":"currentlocation"}

- i want to go from [UK]{"entity":"Country", "role":"currentlocation"} 

The recognition works perfectly for these four sentences:

? Your input -> i want to go from UK to USA                                                                                      
? Is the intent 'entitychallenge' correct for 'i want to go from [UK]{"entity": "Country", "role": "currentlocation"} to [USA]{"e
ntity": "Country", "role": "destination"}' and are all entities labeled correctly?  (Y/n) 

? Your input -> i want to go from USA to UK                                                                                      
? Is the intent 'entitychallenge' correct for 'i want to go from [USA]{"entity": "Country", "role": "currentlocation"} to [UK]{"e
ntity": "Country", "role": "destination"}' and are all entities labeled correctly?  (Y/n)   

? Your input -> i want to go to USA                                                                                              
? Is the intent 'entitychallenge' correct for 'i want to go to [USA]{"entity": "Country", "role": "destination"}' and are all entities labeled correctly?  (Y/n) 

? Your input -> i want to go to UK                                                                                               
? Is the intent 'entitychallenge' correct for 'i want to go to [UK]{"entity": "Country", "role": "destination"}' and are all entities labeled correctly?  (Y/n)  

However it does not seem to work for the other two sentences:

? Your input -> i want to go from USA                                                                                            
? Is the intent 'entitychallenge' correct for 'i want to go from [USA]{"entity": "Country", "role": "destination"}' and are all e
ntities labeled correctly?  (Y/n) 

? Your input -> i want to go from UK                                                                                             
? Is the intent 'entitychallenge' correct for 'i want to go from [UK]{"entity": "Country", "role": "destination"}' and are all en
tities labeled correctly?  (Y/n) 

Is it possibly due to the bot not having enough data/examples or do you think maybe the sentences are too similar?

Looks good.

Yes, it could be that you have not enough training examples. Just try to add a couple of more examples and see how it goes. However, also other users reported similar issues already and I want to take a closer look into that soon as the model might just overfit to USA being destination. Let me know how it goes. Thanks.