What would be the best architecture of a component that enhances entity lookup tables with fuzzy matching?
I see how the regex_featurizer.py is using the lookup tables and I know I have to use FuzzyWuzzy somewhere. Some general thoughts are:
Would I I add something into regex_featurizer.py?
Would I make a new component and use FuzzyWuzzy to look into the lookup table files and see what are the best matches and pass them into the the RegexFeaturizer component?
The the bullets above might be completely off but I am just curious what the general plan would be.
Ok, so conceptually, the separate component would use FuzzyWuzzy to look through the lookup table and pass forward the entities that match up to a certain threshold? And those entities that were matched via FuzzyWuzzy would be separate from any entity extraction that was used in other components further down in the pipeline?