Custom preprocessor

I want create custom component to preprocess input message before tokenizer. This component replace several words.

What is type of component should i pick? I really want to separate this component from tokenizer component.

    class ComponentType(Enum):
        """Enum to categorize and place custom components correctly in the graph."""

        MESSAGE_TOKENIZER = 0
        MESSAGE_FEATURIZER = 1
        INTENT_CLASSIFIER = 2
        ENTITY_EXTRACTOR = 3
        POLICY_WITHOUT_END_TO_END_SUPPORT = 4
        POLICY_WITH_END_TO_END_SUPPORT = 5
        MODEL_LOADER = 6

I think the MESSAGE_TOKENIZER would be the best choice since you are modifying the user utterance like a tokenizer would do.

Thank you for your help!

I tried MESSAGE_FEATURIZER, it works too! I just always return None in features.

    def _get_features(self, message, attribute) -> Tuple:
        if not message.get(attribute):
            return None, None

        text = message.get(attribute)
        text = self.keyword_processor.replace_keywords(text)
        
        message.set(
            attribute,
            text,
            add_to_output=True
        )

        return None, None

When i use MESSAGE_TOKENIZER, should i return empty tokens list?

I think so but I would review the source code for the other featurizers.