How to use SpacyNLP: Language(korean) not officially supported by spacy

hi, i want to use SpacyNLP. My Language Korean is not supported by Spacy. but, spacy official document says that If some libraries are installed, they can be used. so, i can use mecab library with spacy.blank.

spacyNLP uses spacy.load method not blank method. i can change " def load_model " in "spacy_utils.py ".

The sentence below is the part of the code I modified. You can focus on ->.

return spacy.load(spacy_model_name, disable=[“parser”]) -> return spacy.blank(spacy_model_name)

Then the following error occurs:

Exception: Failed to load spacy language model for lang ‘ko’. Make sure you have downloaded the correct model (https://spacy.io/docs/usage/).

I think when I modify code to use spacynlp, there is a part that I need to modify more. I don’t know where it is.

Hi, I think the easiest way to do this is to link your model with language code ko. That way, you don’t need to modify a lot of code inside rasa or spacy. You can follow this page I think: https://spacy.io/usage/models#download-manual

Thank you for good advice. Unfortunately, I don’t have an spacy model about Korean. There is also no Korean spacy model on the Releases · explosion/spacy-models · GitHub site. So, like the link you told me, I think it will be hard to practice. That’s why I’m trying to use spacy.blank(‘ko’). This method makes a blank model. What am I missing?

Hello.

If it is a blank model, why do you want to include spacy in your pipeline? You can follow the non-English pipeline template in this page: https://rasa.com/docs/rasa/nlu/choosing-a-pipeline/

Then you probably want to replace the WhitespaceTokenizer with custom tokenizer (see this link on how to design a custom nlu pipeline) and put any Korean tokenizer you have inside your custom component. Won’t that be a better solution?

1 Like

Hi.

I think using spacynlp seemed simpler, so I kept trying. I knew how you told me but I didn’t try. I was not confident because I had never customized tokenizer. Is there any document I can refer to? After reviewing the official documents, i think the way you told me is right. I will try again after referring to the link you told me.

I’m not that familiar with Korean tokenizer, but a little bit of Google lead me to this page: KoNLPy: Korean NLP in Python — KoNLPy 0.5.2 documentation

You can try adding that tokenizer into your custom pipeline.

1 Like

Thank you. That’s enough. The rest is up to me.