hi, i want to use SpacyNLP. My Language Korean is not supported by Spacy. but, spacy official document says that If some libraries are installed, they can be used. so, i can use mecab library with spacy.blank.
spacyNLP uses spacy.load method not blank method. i can change " def load_model " in "spacy_utils.py ".
The sentence below is the part of the code I modified. You can focus on ->.
return spacy.load(spacy_model_name, disable=[“parser”]) -> return spacy.blank(spacy_model_name)
Then the following error occurs:
Exception: Failed to load spacy language model for lang ‘ko’. Make sure you have downloaded the correct model (https://spacy.io/docs/usage/).
I think when I modify code to use spacynlp, there is a part that I need to modify more. I don’t know where it is.
I think the easiest way to do this is to link your model with language code
ko. That way, you don’t need to modify a lot of code inside rasa or spacy. You can follow this page I think: https://spacy.io/usage/models#download-manual
Thank you for good advice. Unfortunately, I don’t have an spacy model about Korean.
There is also no Korean spacy model on the Releases · explosion/spacy-models · GitHub site. So, like the link you told me, I think it will be hard to practice. That’s why I’m trying to use spacy.blank(‘ko’). This method makes a blank model. What am I missing?
If it is a blank model, why do you want to include spacy in your pipeline? You can follow the non-English pipeline template in this page: https://rasa.com/docs/rasa/nlu/choosing-a-pipeline/
Then you probably want to replace the
WhitespaceTokenizer with custom tokenizer (see this link on how to design a custom nlu pipeline) and put any Korean tokenizer you have inside your custom component. Won’t that be a better solution?
I think using spacynlp seemed simpler, so I kept trying. I knew how you told me but I didn’t try. I was not confident because I had never customized tokenizer. Is there any document I can refer to? After reviewing the official documents, i think the way you told me is right. I will try again after referring to the link you told me.
I’m not that familiar with Korean tokenizer, but a little bit of Google lead me to this page: KoNLPy: Korean NLP in Python — KoNLPy 0.5.2 documentation
You can try adding that tokenizer into your custom pipeline.
Thank you. That’s enough. The rest is up to me.