Purpose and usage of positive and negative influencer ngrams for improving lookup tables

After reading the blog about improving entity recognition via lookup tables and fuzzy matching / character n-grams, I am a bit confused on how to actually use these positive and negative influencer n-grams.

Based on my understanding, we can use the example scripts documented to generate these n-grams for our look up tables and then it says to add them as a separate lookup tables. Concerns are:

How would training data change based on these n-grams? Would the training data have to have annotations marking the n-grams?

Even if an n-gram is matched, how is it necessarily linked to the actually entity we wanted to match in the first place? For example, the n-gram found “inc” so how does it actually match to “apple inc”?


hi anthony,

you don’t have to change anything about the training data, you should just make sure there are a couple of examples in your trainign data with entities highlighted that are also in the lookup table, that way the model knows that the information ‘this word appears in the lookup table’ is important

But if we add these positive and negative ngrams to the look up table to have non-exact match capabilities as stated:

Finally, the positive and negative influencer ngrams may be put into separate lookup tables and inserted into the training data and used on our NLU problem.

What actually links these ngrams to the actuall entities that we want?

there is no explicit linking between the ngrams and the entitiy, you also don’t specify which ngrams are ‘positive’ or ‘negative’. The ngrams just provide some extra information and the model figures out what to do with that info

I see, thanks!