Adding text preprocessing component to Rasa

Hi, I would like to add a text preprocessing custom component in the beginning of the Rasa pipeline. Mainly, my config file will look like;


  • name: "preprocessing"
  • name: “nlp_spacy” model: “fr”
  • name: “spacy_tokenizer”
  • name: “intent_entity_featurizer_regex”
  • name: “ner_crf”

Could I add it without giving a value to “provides”? because I don’t want to change the script of nlp_spacy component. Any suggestion is highly appreciated!


Hi @OmarGr,

What will the preprocessor do? This should work fine with an empty list, i.e. provides = list()

1 Like

Hi @MetcalfeTom, Thank you for your reply, I really appreciate it! However, it doesn’t solve my problem!

Preprocessing: detect emojies and add space between them and the adjacent word. My Pipeline is the following:

language: “fr”


  • name: “preprocessing_component.preprocessor”
  • name: “nlp_spacy” model: “fr”
  • name: “tokenizer_spacy”
  • name: “intent_entity_featurizer_regex”
  • name: “ner_crf”

My preprocessing component will feed the nlp_spacy with its output “sentence”, the script is below:

class preprocessor(Component): “”“preprocessor”""

name = "proprocessing_component"
provides = {"sentence"}
defaults = {}

. . . .

def train(
    self, training_data: TrainingData, config: RasaNLUModelConfig, **kwargs: Any
) -> None:

    for example in training_data.training_examples:
        example.set("sentence", self.add_space(example.text))

def process(self, message: Message, **kwargs: Any) -> None:

    message.set("sentence", self.add_space(message.text))

How can I add this component without changing nlp_spacy? Your suggestions are highly appreciated!

Thanks, Omar

Hi @OmarGr,

I’m facing a similar problem right now. Did you manage to solve this issue? Any help would be greatly appreciated.