Hello everyone ,
I tried to train my model with ner crf Extractor to extract some custom entities using a custom whitespace tokenizer but it gives the bellow error :
‘CRFEntityExtractor’ requires [‘Tokenizer’]
And also when I try to use the normal whitespacetokenizer it still throws the same error , should I use a specific tokenizer or is the pipeline wrong , can you please help :
Here is my pipeline
Hello , the custom whitespace tokenizer is working great it looks like this :
class WhitespaceTokenizer_ar_cdg(Tokenizer, Component):
provides = [TOKENS_NAMES[attribute] for attribute in MESSAGE_ATTRIBUTES]
def unique_words(lines):
return set(chain(*(line.split() for line in lines if line)))
dict= {}
defaults = {
# text will be tokenized with case sensitive as default
"case_sensitive": True
}
def __init__(self, component_config: Dict[Text, Any] = None) -> None:
"""Construct a new tokenizer using the WhitespaceTokenizer framework."""
super(WhitespaceTokenizer_ar_cdg, self).__init__(component_config)
self.case_sensitive = self.component_config["case_sensitive"]
The same as the original whitespacetokenizer I just added some code , that’s not the issue cause it’s working fine if I remove the CRFextractor . Once I add this specific extractor , it throws the error and even when I use just the normal whitespacetokenizer it doesnt work , still gives me this error
‘CRFEntityExtractor’ requires [‘Tokenizer’] as I showed below .