Extracting Entities when it is plural

I am using RegexEntityExtractor because I have a table of values with me which could be the given item in the sentence. for ex: Lookup table has: table chair window

Problem is that the extractor does not recognise plurals, like tables, chairs. Is there anyway to use regex to solve this? Or should i manually put the plural values in the table.

Hey @CharuchithRanjit, I don’t think there’s a straightforward solution – after all, plurals may be formed in different ways (besides the usual -s and -es suffixes). I guess your best bet is to:

  • include the plural values in the lookup table
  • if you also want to map both table and tables to the same entity value (e.g. table), add the variations (here tables) as synonyms in your data

Alternatively, you could generate a long regex for all the possible values in both singular and plural.

I think all of the above should be relatively easy to do programmatically given that you’ve got a table of all the possible values. The only pitfall is, as I mentioned, if some of the words have unusual plural forms :slight_smile:

Hi @CharuchithRanjit ,

One more solution you can try is to add regex with \w*s. As we know * count character from zero to more than zero.

I hope this will help you.

you can try stemming of tokens before passing to the entity extractor. not sure if stemming works for everything but usually it can bring the root word based on pretrained models like spacy.

1 Like