I am using RegexEntityExtractor because I have a table of values with me which could be the given item in the sentence.
for ex:
Lookup table has:
table
chair
window
Problem is that the extractor does not recognise plurals, like tables, chairs. Is there anyway to use regex to solve this? Or should i manually put the plural values in the table.
Hey @CharuchithRanjit, I don’t think there’s a straightforward solution – after all, plurals may be formed in different ways (besides the usual -s and -es suffixes). I guess your best bet is to:
include the plural values in the lookup table
if you also want to map both table and tables to the same entity value (e.g. table), add the variations (here tables) as synonyms in your data
Alternatively, you could generate a long regex for all the possible values in both singular and plural.
I think all of the above should be relatively easy to do programmatically given that you’ve got a table of all the possible values. The only pitfall is, as I mentioned, if some of the words have unusual plural forms
you can try stemming of tokens before passing to the entity extractor. not sure if stemming works for everything but usually it can bring the root word based on pretrained models like spacy.