Adding a custom preprocesor to MessageProcessor

pixpack · September 12, 2019, 6:43pm

Hello, I have been trying to implement a custom preprocessor that replaces certain words in the received message by fuzzy matching. I have tried to import a function that I wrote but when I set the message_preprocessor to the function it does not change and stays None.

Here is the code for the fuzzy matching function:

from fuzzywuzzy import process

FUZZY_SEARCH_DATA = ['krzysztof', 'laura', 'agnieszka', 'milena']

def fuzzy_match_preprocessor(message):
    words = message.split()
    new_string = []
    for word in words:
        fuzzy_matched_value = process.extractOne(word, FUZZY_SEARCH_DATA, score_cutoff= 65)
        if fuzzy_matched_value != None:
            new_string.append(fuzzy_matched_value[0].lower())
        else:
            new_string.append(word)
    message_text = ' '.join(new_string)
    return message_text

I import it from a file called fuzzy_matcher.py with this line:

from rasa.core.fuzzy_matcher import fuzzy_match_preprocessor

And this is how it is assigned to the message_preprocessor:

message_preprocessor: Optional[LambdaType] = fuzzy_match_preprocessor

Any help would be appreciated.

tyd · September 13, 2019, 8:10am

Hi @pixpack! Have you thought about adding this as a custom component in the NLU pipeline? You can also find more info about it in the docs here

pixpack · September 14, 2019, 4:51pm

Hello @tyd. Thank you very much. I’ve managed to fix my issue with the preprocessor but I’m now considering reimplementing it as a custom component.

pixpack · September 15, 2019, 7:23pm

In case anyone is intrested in the component I’m posting the code here. This solution is optimised for fuzzy matching full names or pairs of words. You can change this behavior by changing the ngram_range.

 from rasa.nlu.components import Component
    import typing
    from typing import Any, Optional, Text, Dict

    if typing.TYPE_CHECKING:
        from rasa.nlu.model import Metadata


    class SpellChecker(Component):
        """A new component"""

        provides = ['text']

        requires = []

        defaults = {}

        language_list = None

        def __init__(self, component_config=None):
            super(SpellChecker, self).__init__(component_config)

        def train(self, training_data, cfg, **kwargs):
            pass

        def process(self, message, **kwargs):
            
            from fuzzywuzzy import process

            FULLNAMES = ['john kowalski', 'andy michaels']

            text = message.text

            ngram_range = 2
            words = text.split()
            ngrams = zip(*[words[i:] for i in range(ngram_range)])
            tokens = [' '.join(ngram) for ngram in ngrams]

            fuzzy_matched_values = []
            scores = []
            if tokens:
                for token in tokens:
                    fuzzy_match = process.extract(token, FULLNAMES, limit = 1)[0]
                    fuzzy_matched_values.append(fuzzy_match)
                    scores.append(fuzzy_match[1])

                cutoff_threshold = 75

                if max(scores) >= cutoff_threshold:
                    max_value_index = scores.index(max(scores))
                    text = text.replace(tokens[max_value_index],fuzzy_matched_values[max_value_index][0])

            message.text = text

        def persist(self, file_name: Text, model_dir: Text) -> Optional[Dict[Text, Any]]:
            """Persist this component to disk for future loading."""

            pass

        @classmethod
        def load(
            cls,
            meta: Dict[Text, Any],
            model_dir: Optional[Text] = None,
            model_metadata: Optional["Metadata"] = None,
            cached_component: Optional["Component"] = None,
            **kwargs: Any
        ) -> "Component":
            """Load this component from file."""

            if cached_component:
                return cached_component
            else:
                return cls(meta)

saimanoj2826 · May 11, 2021, 1:35pm

@pixpack where to add it in the pipeline and does it work for every chatbot or do we need change anything

Topic		Replies	Views
Message Preprocessor Rasa Open Source	2	1321	September 10, 2020
Writing a custom component to preprocess text Rasa Open Source	0	665	July 27, 2022
Adding text preprocessing component to Rasa Rasa Open Source	4	1338	March 9, 2021
Preprocessing input user message Rasa Open Source	0	330	April 15, 2021
Example of custom components Getting Started with Rasa	5	296	July 10, 2019

Adding a custom preprocesor to MessageProcessor

Related topics