Custom preprocessor

artemsnegirev · April 12, 2022, 10:17am

I want create custom component to preprocess input message before tokenizer. This component replace several words.

What is type of component should i pick? I really want to separate this component from tokenizer component.

    class ComponentType(Enum):
        """Enum to categorize and place custom components correctly in the graph."""

        MESSAGE_TOKENIZER = 0
        MESSAGE_FEATURIZER = 1
        INTENT_CLASSIFIER = 2
        ENTITY_EXTRACTOR = 3
        POLICY_WITHOUT_END_TO_END_SUPPORT = 4
        POLICY_WITH_END_TO_END_SUPPORT = 5
        MODEL_LOADER = 6

stephens · April 12, 2022, 5:10pm

I think the MESSAGE_TOKENIZER would be the best choice since you are modifying the user utterance like a tokenizer would do.

artemsnegirev · April 13, 2022, 6:49am

Thank you for your help!

I tried MESSAGE_FEATURIZER, it works too! I just always return None in features.

    def _get_features(self, message, attribute) -> Tuple:
        if not message.get(attribute):
            return None, None

        text = message.get(attribute)
        text = self.keyword_processor.replace_keywords(text)
        
        message.set(
            attribute,
            text,
            add_to_output=True
        )

        return None, None

When i use MESSAGE_TOKENIZER, should i return empty tokens list?

stephens · April 13, 2022, 3:47pm

I think so but I would review the source code for the other featurizers.

Topic		Replies	Views
Preprocessing message in custom component? Rasa Open Source	1	1038	December 21, 2018
Custom component for Spell Checking Rasa Open Source	1	1410	July 10, 2019
Message Preprocessor Rasa Open Source	2	1318	September 10, 2020
Writing a custom component to preprocess text Rasa Open Source	0	664	July 27, 2022
Adding a custom preprocesor to MessageProcessor Rasa Open Source	4	877	May 11, 2021

Custom preprocessor

Related topics