Same entity types occurring together in a text meeses up the entity and group prediction

shwetar · June 6, 2021, 6:46pm

I am working on creating a chatbot that takes the orders for groceries. So the queries like- I would like [5]{“entity”:“qty”,“group”:“1”} [large]{“entity”:“size”,“group”:“1”} [carrots]{“entity”:“item”,“group”:“1”} and [10]{“entity”:“qty”,“group”:“2”} [lemons]{“entity”:“item”,“group”:“2”} The above query works well, but when we go for three items the entity and group prediction badly reduces. Consider the query like - I would like [5]{“entity”:“qty”,“group”:“1”} [large]{“entity”:“size”,“group”:“1”} [carrots]{“entity”:“item”,“group”:“1”} [10]{“entity”:“qty”,“group”:“2”} [lemons]{“entity”:“item”,“group”:“2”} and [10]{“entity”:“qty”,“group”:“3”} [tomato]{“entity”:“item”,“group”:“3”} For this query , the group is not predicted successfully.

Kindly share if anyone has these type of use cases where the same type of entities (item) in this case occur together multiple times in the same query and group is used to associate other attributes like qty and size

harloc · June 7, 2021, 8:22am

From my experience (not only with Rasa) it is tricky to extract group relations with CRFs, which are implemented in Rasa. Group labels are kind of static in the sense that the AI actually learns a fixed label like “groupe: 3” and not just that two or more items need to have the same label. The group labels are determined depending on the features of the token/tokens and the transition probabilities. But with the static properties of the group labels itself, this sometimes results in the label being predicted based on the specific token rather than the sentence structure. So for example: if the token “10” in all examples has the label “groupe: 2” then the AI will predict this label independent from the actual sentence. What helpes is to have a number of permutations for all available lists you can imagine. You will then have examples like this:

10 carrots, 5 apples, 2 cucumbers
2 carrots, 10 apples, 5 cucumbers
5 carrots, 2 apples, 10 cucumbers
10 apples, 5 cucumbers, 2 carrots
etc.

In the best case the AI will now learn to predict the label based on the sentence structure. This has of course some downsides. First, the amount of data is increased a lot and the training time will increase as well. Second, the dataset is now rather artificial and the AI might perform worse on other predictions, because the users will in the end behave differently than it is suggested based on your dataset.

Topic		Replies	Views
Setup data for entity group recognition Rasa Open Source	1	507	September 23, 2020
Detecting multiple entities of the same type Rasa Open Source	5	1400	March 18, 2020
Introducing entity roles and groups Release Announcements rasa , entity	45	7644	June 7, 2022
Accurately identifying size and quantity of individual entities in a multiple entity statement where it is optionally provided Rasa Open Source	1	732	June 22, 2020
Grouping of different entity sets within a single message Rasa Open Source	0	277	May 26, 2022

Same entity types occurring together in a text meeses up the entity and group prediction

Related topics