How to extract predicate expressions

Is there a best practice for extracting things like:

under $20
more than 43 points
between 30 and 35

under $20 and over 43 points
between 30 and 35, but no more than $10

I need both the number(s) and the operator; and they may be compound.

From what you provided, i’m not sure that i completely understand your intention but:

If ‘$20’ and ‘43’ points are 2 separate entities, you can try this and see how it goes:

under [$20](money)
more than [43](points) points
between [30](points) and [35](points)
under [$20](money) and over [43](points) points
between [30](points) and [35](points), but no more than [$10](money)

Add regex feature for money in nlu.md:

## regex:money
- \$[0-9]+

If you want the bot to pick them up as 1 entities then i guess just use 1 annotation for both of them ?

These are just my first thoughts, they might not be the best practice.

@fuih Thanks for the response. I think your proposal only extracts the numeric value, not the predicate. e.g.,

“under $20” and “over $20” would just extract “$20”:money, but they are very different --> one is “under” and the other is “over”

I guess another possibility is to mark the entire phrase as an entity and then apply some post-processing to get the predicate value.

e.g.,

under $20 more than 43 points

Then I’d just write some Python/regex code to post process the entities. It’s a bit of a hack, so I was wondering if there was something more elegant.

Oh i understand now. It’s sure an interesting problem. The only thing i can think of right now is having a ‘predicate’ slot for ‘under’, ‘over’, ‘more than’,… and train the bot to recognize it as well as other slots. In addition, you use a look up table for it too (since the predicate can only has like, below 10 different examples i assume). Not sure if that will work as well as you want though :smile:.

P/S: Or maybe train the bot to recognize [over $40], [more than 40],… and process them seperately.

There are actually hundreds variants (under, not more than, my budget is, I can only afford, below … above, not less than, … etc.), especially when compound expressions are used. In a previous project I used spaCy NER to extract phrases, then ran those through a classifier to determine the specific operator (e.g. ,"$lt", “$between”, etc.) and finally wrote some nasty Python code to determine the numeric value and the attribute (price, rating, etc.). It worked well, but was a lot of work.

Tried a regex-only approach as well and that didn’t go well.

I will read the docs a bit more, but right I’ll try the approach of extracting phrases that include operator, number and attribute and then see if I can cobble together a bit of Python/regex code. Luckily I don’t need compound expressions here so it’ll make it easier.

Thanks again.

Do we have any out of the box solution to determine the predicate (between, greater, less than, more) along with the numeric values as a part of entity extraction?

Hello! Anyone had any tips on how to do this? I need to consider determiners such as “entire”, “total” (i.e. total number of…), “all”.

Thanks in advance!

Have you considered roles? I’ve never used them but they might work well for your case.

hi @alexyuwen no I haven’t! i think this is perfect for my use case. thanks a lot!