Custom spaCy language model, which parts do I need to train?

juste
(Erik) #1

Hi all,

I’m using Rasa NLU (version 0.13.8) with spaCy (version 2.0.11), and am wondering which parts of spaCy are used when using the “spacy_sklearn” pipeline. My goal is to train a custom language model with my own data in spaCy, but I cannot find which components from spaCy are used (tagger, parser, ner) in Rasa NLU. Does somebody know?

Thanks in advance!

(Lahsuk) #2

Hi Erik,

From what I see, entities, vectors and tokens are used for classification using spacy of which vectors and tokens are only required for classification.

So, if you want to use spacy, you should have alpha support for your language which includes tokenization rules, various other rules and language data.

Hope that helps.

1 Like
(Erik) #3

Hi lahsuk,

Thanks for your reply. Turns out you can just use the spacy init-model command line functionality and initialize it with (pre)trained word embeddings and then package it without ever training on NER or PoS data if you do it with a language that spaCy already supports.