Questions of Rasa with Spacy

WilliamQue · March 24, 2023, 2:26am

How to specify user dictionary of spacy for non-English language, you know the performance of tokenizer will affect the whole nlp process, such as entity extraction. for example, we specify it for Chinese language

nlp.tokenizer.pkuseg_update_user_dict(['yyds','cx-4'])

How to specify custom entities in spacy for rasa.

import spacy

nlp = spacy.load('zh_core_web_sm')
nlp.tokenizer.pkuseg_update_user_dict(['yyds', 'cx-4'])
ruler = nlp.add_pipe("entity_ruler")
patterns = [
    {"label": "net_hot_word", "pattern": "yyds"},
    {"label": "car_name", "pattern": "cx-4"}
]

ruler.add_patterns(patterns)

SevenTailCat · June 30, 2023, 7:00am

Hi，do you have solutions? I want to do this also. I am trying to rewrite the spacytokenizer in rasa.nlu.tokenizer. However, I do not find the way how rasa use the spacyNLP. I am confuse for the inputing message.

class SpacyTokenizer(Tokenizer):
    def get_doc(self, message: Message, attribute: Text) -> Optional["Doc"]:
        return message.get(SPACY_DOCS[attribute])

    def tokenize(self, message: Message, attribute: Text) -> List[Token]:
        doc = self.get_doc(message, attribute)
        print('doc: ')
        print(doc)
        if not doc:
            return []

        tokens = [
            Token(
                t.text, t.idx, lemma=t.lemma_, data={POS_TAG_KEY: self._tag_of_token(t)}
            )
            for t in doc
            if t.text and t.text.strip()
        ]

        return self._apply_token_pattern(tokens)

I don’t know how this load spacynlp model

caimmy · November 23, 2023, 9:29am

Because the code for spacy to load the language model is not in the tokenizer, you should not modify the SpacyTokenizer to customize the user dictionary.

In fact, the language model is loaded in the SpacyNLP node, so you can consider inheriting rasa.nlu.utils.spacy_utils.SpacyModel and overloading the load_model static method. The reference code is as follows:

class KefuSpacyNlp(SpacyNLP):
    @staticmethod
    def load_model(spacy_model_name: Text) -> SpacyModel:
        _spacy_model = SpacyNLP.load_model(spacy_model_name)
        _spacy_model.model.tokenizer.pkuseg_update_user_dict(["word1", "word2", ...])
        return _spacy_model

Topic		Replies	Views
How to add user_dict in SpacyTokenizer Rasa Open Source	2	473	May 17, 2021
RASA issue with SpacyTokenizer Rasa Open Source	3	1089	February 13, 2022
How a SpaCy language model component improves performances? Getting Started with Rasa	2	209	August 29, 2021
Train custom spacy model with rasa train Rasa Open Source	0	421	May 2, 2022
How to add spacy model to rasa Rasa Open Source	0	215	January 9, 2024

Questions of Rasa with Spacy

Related topics