Entity recognition with lookup tables and fuzzy matching

What would be the best architecture of a component that enhances entity lookup tables with fuzzy matching?

I see how the regex_featurizer.py is using the lookup tables and I know I have to use FuzzyWuzzy somewhere. Some general thoughts are:

  • Would I I add something into regex_featurizer.py?

  • Would I make a new component and use FuzzyWuzzy to look into the lookup table files and see what are the best matches and pass them into the the RegexFeaturizer component?

The the bullets above might be completely off but I am just curious what the general plan would be.

It’s best to create a separate component for this

Ok, so conceptually, the separate component would use FuzzyWuzzy to look through the lookup table and pass forward the entities that match up to a certain threshold? And those entities that were matched via FuzzyWuzzy would be separate from any entity extraction that was used in other components further down in the pipeline?

Hi @ap_rasa, I am using this Fuzzywuzzy option. could you please share me some example code.