How to extract predicate expressions

Is there a best practice for extracting things like:

under $20
more than 43 points
between 30 and 35

under $20 and over 43 points
between 30 and 35, but no more than $10

I need both the number(s) and the operator; and they may be compound.

From what you provided, i’m not sure that i completely understand your intention but:

If ‘$20’ and ‘43’ points are 2 separate entities, you can try this and see how it goes:

under [$20](money)
more than [43](points) points
between [30](points) and [35](points)
under [$20](money) and over [43](points) points
between [30](points) and [35](points), but no more than [$10](money)

Add regex feature for money in nlu.md:

## regex:money
- \$[0-9]+

If you want the bot to pick them up as 1 entities then i guess just use 1 annotation for both of them ?

These are just my first thoughts, they might not be the best practice.

@fuih Thanks for the response. I think your proposal only extracts the numeric value, not the predicate. e.g.,

“under $20” and “over $20” would just extract “$20”:money, but they are very different --> one is “under” and the other is “over”

I guess another possibility is to mark the entire phrase as an entity and then apply some post-processing to get the predicate value.

e.g.,

under $20 more than 43 points

Then I’d just write some Python/regex code to post process the entities. It’s a bit of a hack, so I was wondering if there was something more elegant.

Oh i understand now. It’s sure an interesting problem. The only thing i can think of right now is having a ‘predicate’ slot for ‘under’, ‘over’, ‘more than’,… and train the bot to recognize it as well as other slots. In addition, you use a look up table for it too (since the predicate can only has like, below 10 different examples i assume). Not sure if that will work as well as you want though :smile:.

P/S: Or maybe train the bot to recognize [over $40], [more than 40],… and process them seperately.

There are actually hundreds variants (under, not more than, my budget is, I can only afford, below … above, not less than, … etc.), especially when compound expressions are used. In a previous project I used spaCy NER to extract phrases, then ran those through a classifier to determine the specific operator (e.g. ,"$lt", “$between”, etc.) and finally wrote some nasty Python code to determine the numeric value and the attribute (price, rating, etc.). It worked well, but was a lot of work.

Tried a regex-only approach as well and that didn’t go well.

I will read the docs a bit more, but right I’ll try the approach of extracting phrases that include operator, number and attribute and then see if I can cobble together a bit of Python/regex code. Luckily I don’t need compound expressions here so it’ll make it easier.

Thanks again.