I want create custom component to preprocess input message before tokenizer. This component replace several words.
What is type of component should i pick? I really want to separate this component from tokenizer component.
class ComponentType(Enum):
"""Enum to categorize and place custom components correctly in the graph."""
MESSAGE_TOKENIZER = 0
MESSAGE_FEATURIZER = 1
INTENT_CLASSIFIER = 2
ENTITY_EXTRACTOR = 3
POLICY_WITHOUT_END_TO_END_SUPPORT = 4
POLICY_WITH_END_TO_END_SUPPORT = 5
MODEL_LOADER = 6
stephens
(Greg Stephens)
2
I think the MESSAGE_TOKENIZER
would be the best choice since you are modifying the user utterance like a tokenizer would do.
Thank you for your help!
I tried MESSAGE_FEATURIZER
, it works too! I just always return None in features.
def _get_features(self, message, attribute) -> Tuple:
if not message.get(attribute):
return None, None
text = message.get(attribute)
text = self.keyword_processor.replace_keywords(text)
message.set(
attribute,
text,
add_to_output=True
)
return None, None
When i use MESSAGE_TOKENIZER
, should i return empty tokens list?
stephens
(Greg Stephens)
4
I think so but I would review the source code for the other featurizers.