Some questions about lookup table in NLU

Hi, here are some questions about lookup table in NLU.

Will a huge lookup table hurt the performance of an assistant?

  • Is the lookup table implemented by some hash method?

How many lookup table entries need to be appear in training data (i.e. nlu.md) to make it work?

  • I am implementing an assistant that can provide songs using song titles given by user, so I use a lookup table to store some song titles.
  • At the moment, I just store about 10 titles (just for a quick test) in a txt file corresponds to the table.
  • However, my bot can not learn anything from that file. Song titles that appears in nlu.md are the only things it can remember.

Here is my pipeline config, hope this can help

  • name: SpacyNLP
  • name: JiebaTokenizer
  • name: RegexFeaturizer
  • name: CRFEntityExtractor
  • name: CountVectorsFeaturizer
  • name: CountVectorsFeaturizer
    analyzer: “char_wb”
    min_ngram: 1
    max_ngram: 4
  • name: DIETClassifier
    epochs: 100
  • name: EntitySynonymMapper
  • name: ResponseSelector
    epochs: 100

Could someone help me with these questions? Appreciate a lot.

Hey there,

maybe these two links can help you clarify some questions:

NLU Training Data Format

10 Best Practices for Designing NLU Training Data

Question 1: According to my first link huge and noisy lookup tables can hurt performance.

Question 2: I don’t think there is a total number for how many lookup table entries should be in your training data examples. I think it’s always good practice to have a balanced amount of examples.

I hope this helps you a little bit.

Regards, Tristan

1 Like