Lexical Syntactic Featurizer

Hi,

I was reading about Lexical Syntactic Featurizer and saw below default configuration

pipeline:
- name: LexicalSyntacticFeaturizer
  "features": [
    ["low", "title", "upper"],
    ["BOS", "EOS", "low", "upper", "title", "digit"],
    ["low", "title", "upper"],
  ]

The features are present as LIST of LIST and I see the 1st and 3rd list are exactly the same. The contents of 1st list is present in the 2nd list as well. Then why don’t we just have features like below

pipeline:
- name: LexicalSyntacticFeaturizer
  "features": [
    ["BOS", "EOS", "low", "upper", "title", "digit"],
  ]

What is the importance of 3 list in default setting? Kindly advise.

Hi, according the documentation of LexicalSyntacticFeaturizer, the three lists in this case define the features that are extracted in a sliding window for the token before, the current token, and the following token. This allows following NLU components to take features from the surrounding tokens into account when looking at a particular token position.

1 Like

@MatthiasLeimeister Thank you for the information!

So in this case the window is 1 on both sides of token. So if we can the info about 2 words on either side of the token then should we define it like below?

pipeline:
- name: LexicalSyntacticFeaturizer
  "features": [
    ["low", "title", "upper"],
    ["low", "title", "upper"],
    ["BOS", "EOS", "low", "upper", "title", "digit"],
    ["low", "title", "upper"],
    ["low", "title", "upper"]
  ]

Please advise.

Hi, yes exactly. Adding more feature lists will increase the windows size around the current token.

A more detailed documentation of the behaviour can also be found in the class documentation here.

1 Like

@MatthiasLeimeister Thank you!!

1 Like

You’re welcome, glad I could help :slight_smile:

1 Like