Hi there,
I’d like to have two spacy
models in my pipeline, but the current implementation doesn’t seem well-suited to that. The SpacyNLP
object will set "spacy_doc"
, and were I to have a second SpacyNLP
object with a different model
, I believe I’d overwrite it.
It does seem possible to accomplish everything I’d want the spacy_doc
for before including the second SpacyNLP
component. For instance if I used
- name: 'SpacyNLP'
model: 'en_core_web_md'
- name: 'SpacyTokenizer' #is this necessary anymore?
- name: 'SpacyFeaturizer'
- name: 'SpacyNLP'
model: 'my_other_model'
- name: 'SpacyTokenizer'
- name: 'SpacyFeaturizer'
My best guess is that the above pipeline would work, and anything relying on "tokens"
would get the second my_other_model
's tokens. But that you’d featurize the document vectors for both successfully.
Other than being a memory-glutton, is there anything else wrong with that? Is there any appetite for supporting a cleaner interface for that?