How to use SpacyNLP: Language(korean) not officially supported by spacy

JangDaehyuk · August 19, 2020, 4:49am

hi, i want to use SpacyNLP. My Language Korean is not supported by Spacy. but, spacy official document says that If some libraries are installed, they can be used. so, i can use mecab library with spacy.blank.

spacyNLP uses spacy.load method not blank method. i can change " def load_model " in "spacy_utils.py ".

The sentence below is the part of the code I modified. You can focus on ->.

return spacy.load(spacy_model_name, disable=[“parser”]) -> return spacy.blank(spacy_model_name)

Then the following error occurs:

Exception: Failed to load spacy language model for lang ‘ko’. Make sure you have downloaded the correct model (https://spacy.io/docs/usage/).

I think when I modify code to use spacynlp, there is a part that I need to modify more. I don’t know where it is.

tomgun132 · August 19, 2020, 6:04am

Hi, I think the easiest way to do this is to link your model with language code ko. That way, you don’t need to modify a lot of code inside rasa or spacy. You can follow this page I think: https://spacy.io/usage/models#download-manual

JangDaehyuk · August 19, 2020, 6:40am

Thank you for good advice. Unfortunately, I don’t have an spacy model about Korean. There is also no Korean spacy model on the Releases · explosion/spacy-models · GitHub site. So, like the link you told me, I think it will be hard to practice. That’s why I’m trying to use spacy.blank(‘ko’). This method makes a blank model. What am I missing?

tomgun132 · August 19, 2020, 6:47am

Hello.

If it is a blank model, why do you want to include spacy in your pipeline? You can follow the non-English pipeline template in this page: https://rasa.com/docs/rasa/nlu/choosing-a-pipeline/

Then you probably want to replace the WhitespaceTokenizer with custom tokenizer (see this link on how to design a custom nlu pipeline) and put any Korean tokenizer you have inside your custom component. Won’t that be a better solution?

JangDaehyuk · August 19, 2020, 6:58am

Hi.

I think using spacynlp seemed simpler, so I kept trying. I knew how you told me but I didn’t try. I was not confident because I had never customized tokenizer. Is there any document I can refer to? After reviewing the official documents, i think the way you told me is right. I will try again after referring to the link you told me.

tomgun132 · August 19, 2020, 7:01am

I’m not that familiar with Korean tokenizer, but a little bit of Google lead me to this page: KoNLPy: Korean NLP in Python — KoNLPy 0.5.2 documentation

You can try adding that tokenizer into your custom pipeline.

JangDaehyuk · August 19, 2020, 7:05am

Thank you. That’s enough. The rest is up to me.

Topic		Replies	Views
Spacy alpha tokenization language support Getting Started with Rasa	1	137	January 18, 2019
How to add spacy model to rasa Rasa Open Source	0	211	January 9, 2024
Using Custom SpaCy model Rasa Open Source	10	1679	May 3, 2020
Profanity_filter: Couldn't load Spacy model for any of languages: en Rasa Open Source	13	2881	November 11, 2021
How to configure the pipeline using other language? Rasa Open Source	1	1733	September 30, 2019

How to use SpacyNLP: Language(korean) not officially supported by spacy

Related topics