Will using lookups make it harder to generalize?

I ported music chat from dialogflow and there are a large number of lookups for albums and artists. As I understand it the featurizer will flag these lookups. Therefore all the initial training examples of artists will have the “artists” flag before going into DIET. Any new artists won’t have this flag so I would expect the model will generalize better without the lookups.

Seems to me lookups work best if there is a short and fixed list which does not need to generalize e.g. cities in Scotland. When used for fashion items such as artists they will actually make the DIET model perform worse on unseen data so should be removed. Is that correct?

Yes, that is correct :slight_smile:

From the docs:

You can use lookup tables to help extract entities which have a known set of possible values. Keep your lookup tables as specific as possible.

but also

When using lookup tables with RegexFeaturizer, provide enough examples for the intent or entity you want to match so that the model can learn to use the generated regular expression as a feature.