Custom spaCy language model, which parts do I need to train?

eriks · July 11, 2019, 8:16am

Hi all,

I’m using Rasa NLU (version 0.13.8) with spaCy (version 2.0.11), and am wondering which parts of spaCy are used when using the “spacy_sklearn” pipeline. My goal is to train a custom language model with my own data in spaCy, but I cannot find which components from spaCy are used (tagger, parser, ner) in Rasa NLU. Does somebody know?

Thanks in advance!

lahsuk · July 12, 2019, 5:21am

Hi Erik,

From what I see, entities, vectors and tokens are used for classification using spacy of which vectors and tokens are only required for classification.

So, if you want to use spacy, you should have alpha support for your language which includes tokenization rules, various other rules and language data.

Hope that helps.

eriks · July 15, 2019, 2:44pm

Hi lahsuk,

Thanks for your reply. Turns out you can just use the spacy init-model command line functionality and initialize it with (pre)trained word embeddings and then package it without ever training on NER or PoS data if you do it with a language that spaCy already supports.

Topic		Replies	Views
What features does Rasa NLU use from spacy? Rasa Open Source	0	666	February 21, 2019
Spacy alpha tokenization language support Getting Started with Rasa	1	137	January 18, 2019
Confusion on SpacyNLP pipeline Rasa Open Source	0	132	May 1, 2024
Train custom spacy model with rasa train Rasa Open Source	0	421	May 2, 2022
How to integrate custom NER using spacy trained custom model on rasa-NLU pipeline Rasa Open Source	2	998	December 17, 2019

Custom spaCy language model, which parts do I need to train?

Related topics