Specify component input in RASA NLU

PierreFG · June 18, 2019, 8:56am

Hi everyone,

Let’s suppose I use the CRFEntityExtractor and the SpacyEntityExtractor at the same time in my pipeline. Of what I’ve read, it’s better to use the first one with WhitespaceTokenizer and the second with SpacyTokenizer.

If I put all these components in my pipeline, can I specify wich NER uses wich tokenizer ?

Thank you,

Pierre

lahsuk · June 18, 2019, 9:52am

Sure, you can. Although you need to override the defaults of quite a few things.
If you look at the source code, both components (Whitespace and Spacy Tokenizer) provide tokens.

You need to change this to provide something different like whitespace_tokens and spacy_tokens.
And then you need to change the NER to include a logic to select which tokens to use. You may also need to change the component configuration down the line if any of the component require tokens.

The code would look something like:

# whitespace_tokenizer
class WhitespaceTokenizer(Tokenizer, Component):
    provides = ["whitespace_tokens"]
...

# spacy_tokenizer
class SpacyTokenizer(Tokenizer, Component):
    provides = ["spacy_tokens"]
    requires = ["spacy_doc"]
...

# crf_entity_extractor
class CRFEntityExtractor(EntityExtractor):
    provides = ["entities"]
    requires = ["whitespace_tokens"]
...

# spacy_entity_extractor
class SpacyEntityExtractor(EntityExtractor):
    provides = ["entities"]
    requires = ["spacy_tokens", "spacy_nlp"]
...

# If you want some fine control then you may need to do it the following way:
# custom_entity_extractor
class CustomEntityExtractor(EntityExtractor):
    provides = ["entities"]
    requires = ["whitespace_tokens", "spacy_tokens"]

    def __init__(self, component_config=None):
        super(CustomEntityExtractor, self).__init__(component_config)
        self.crf = CRFEntityExtractor(component_config)
        self.spacy_ner = SpacyEntityExtractor(component_config)

    # You need to override the train and process methods with your logic
    # about which to use
    def train(
        self, training_data: TrainingData, config: RasaNLUModelConfig, **kwargs: Any
    ) -> None:
         # custom logic to select which entity extractor to use
         # spacy extractor cannot be trained so train CRF here


    def process(self, message: Message, **kwargs: Any) -> None:
        # some logic to select which to use
        # extract entities to extracted
        message.set(
            "entities", message.get("entities", []) + extracted, add_to_output=True
        )

You’ll need to figure out the exact details on how to implement this. (I haven’t done it yet).

Hope that helps.

Topic		Replies	Views
CrfExtractor Pipeline Rasa Open Source	3	340	March 19, 2021
Adding text preprocessing component to Rasa Rasa Open Source	4	1328	March 9, 2021
Using NER as a Feature for CRFEntityExtractor Rasa Open Source	6	1700	June 28, 2021
After using SpacyTokenizer: Misaligned entity annotation error when using CRFEntityExtraction Rasa Open Source	0	1050	February 24, 2020
Multiple NER Rasa Open Source	10	1325	May 24, 2019

Specify component input in RASA NLU

Related topics