Example of custom components

huberrom · November 15, 2018, 3:14pm

Hi everyone, I want to add a spell checking component in my nlu pipeline. I saw the template for custom components, but I didn’t find an full example of how to implement a custom component. If anyone has a full example from a blog / personnal project, I’ll be gratefull. You can delete all the treatment, I just want to see the default configuration and where I can process the message (I know it’s in the process method, but how do I return the message after the treatment ?)

A quick explanation would be very helpful too !

favarete · July 9, 2019, 11:40pm

Hey, @huberrom. I’m also interested in this. Did you found a solution? I’m stuck in this for my project and, based on other similar questions here in the forum, I’ll also be ignored if I open another question about this

gcgloven · July 10, 2019, 2:18am

You may find a reference of Sentiment Analyzer here: Enhancing Rasa NLU models with Custom Components - Rasa Blog - Medium

favarete · July 10, 2019, 1:30pm

I saw this post, but I can’t figure out how to apply the example to this specific case of spell checking. I’ll open another question, maybe explain my problem here will be out of this specific topic a little.

huberrom · July 10, 2019, 2:42pm

Hey, sorry I stopped working on rasa a while ago, but here is a quick example (can’t give you the full code)

from symspellpy.symspellpy import SymSpell

ROOT_DIR = os.path.dirname(os.path.abspath(__file__))
DICTIONARIES_PATH = os.path.join(ROOT_DIR, 'dictionaries/')

class SpellCheck(Component):
""" Component put at the start of the pipeline to spell check the user message."""

name = "SpellChecking"

# Defines what language(s) this component can handle.
language_list = "fr"

# Defines the default configuration parameters of a component
# these values can be overwritten in the pipeline configuration
# of the model.
defaults = {
    "initial_capacity": 83000,
    # maximum edit distance per dictionary precalculation (max_edit_distance_lookup <= max_edit_distance_dictionary)
    "max_edit_distance_dictionary": 2,
    "prefix_length": 7,
    # max edit distance per lookup (per single word, not per whole input string),
    "max_edit_distance_lookup": 2,
    "dictionary": "fr_dictionary.txt"
}

def __init__(self, component_config=None):
    super(SpellCheck, self).__init__(component_config)
    logging.basicConfig(level='DEBUG')
    self.sym_spell = SymSpell(self.component_config["initial_capacity"],
                              self.component_config["max_edit_distance_dictionary"],
                              self.component_config["prefix_length"])
    self.load_sym_spell(self.component_config["dictionary"])

def process(self, message, **kwargs):
    # For the moment, if the dictionary is not loaded, we skip the spell checking
    if self.sym_spell is None:
        logger.info("Skip spell check because dictionary failed to load")
        return
    # Get and split by numbers
    numbers = re.findall(r"\d*[\.|\,]\d+|\d+", message.text)
    split = re.split(r"\d*[\.|\,]\d+|\d+", message.text)

    correction = ""
    while i < len(split):
        suggestions = self.sym_spell.lookup_compound(split[i],
                                                     self.component_config["max_edit_distance_lookup"])
        correction += suggestions[0].term
        i += 1

    # Split is use here to remove useless space
    correction = " ".join(correction.split())
    logger.info("Correction from %s to %s", message.text, correction)
    message.text = correction

def load_sym_spell(self, dictionary):
    # load dictionary
    dictionary_path = os.path.join(DICTIONARIES_PATH, dictionary)
    # column of the term in the dictionary text file
    term_index = 0
    # column of the term frequency in the dictionary text file
    count_index = 1
    if not self.sym_spell.load_dictionary(dictionary_path, term_index, count_index):
        logger.error("Unable to load spell dictionary")
        self.sym_spell = None

So as you can see, I used a library named “sym_spell” (GitHub - mammothb/symspellpy: Python port of SymSpell). Just have to import the dictionnary (which is a list of word + frequency, for example “a 155105”), easy to find in english, harder for other langages.

I do not understand all params sorry, and the corrector is not perfect but it’s doing its job. Small part in the code where I split the message with number, because the corrector is deleting them.

Once you have create your component, you just have to add it in your pipeline like that :

language: "fr"

pipeline:
- name: "spell_check_component.SpellCheck"
- name: "nlp_spacy"
- name: "tokenizer_whitespace"
- name: "ner_crf"
- name: "ner_synonyms"
- name: "intent_featurizer_count_vectors"
- name: "intent_classifier_tensorflow_embedding"
"epochs": 3000

(The component SpellCheck is in the spell_check_component folder)

Hope it helps !

favarete · July 10, 2019, 2:50pm

Great!! Thank you very much! I think I know the problem in my code now

Topic		Replies	Views
[HELP WANTED] Error in Custom Components pipeline Rasa Open Source	1	968	October 22, 2019
Custom rasa component returning string to the next component Rasa Open Source	3	1374	November 6, 2019
Custom component for spell checking not working in Rasa 2.0 Rasa Open Source	11	1053	September 8, 2021
Rasa 2.0, how to get user message in custom component for spell checking? Rasa Open Source	4	1041	December 14, 2020
Custom component - typo correction / misspelling Rasa Open Source	2	1288	December 6, 2019

Example of custom components

Related topics