Features in CRFEntityExtractor

Hi everybody,

we have some problems with classifying the entity type for recognized values. Hence we’d like to use different configurations of the extractor. I cannot find anything about the “features” configuration besides everything here:

While this gives some information I still don’t know anything specific about the meaning of the different keys. Can someone point me in the right direction?

Many thanks!

We will update the documentation soon to add the missing explanations. For now, please see the table below.

===============  =============================================================================
Feature Name     Description
===============  =============================================================================
low              Checks if the token is lower case.
upper            Checks if the token is upper case.
title            Checks if the token starts with an uppercase character and all remaining
                 characters are lowercased.
digit            Checks if the token contains just digits.
prefix5          Take the first five characters of the token.
prefix2          Take the first two characters of the token.
suffix5          Take the last five characters of the token.
suffix3          Take the last three characters of the token.
suffix2          Take the last two characters of the token.
suffix1          Take the last character of the token.
pos              Take the Part-of-Speech tag of the token (SpaCy required).
pos2             Take the first two characters of the Part-of-Speech tag of the token
                 (SpaCy required).
pattern          Take the patterns defined by ``RegexFeaturizer``.
bias             Add an additional "bias" feature to the list of features.
===============  =============================================================================
1 Like