I’d like to have two
spacy models in my pipeline, but the current implementation doesn’t seem well-suited to that. The
SpacyNLP object will set
"spacy_doc", and were I to have a second
SpacyNLP object with a different
model, I believe I’d overwrite it.
It does seem possible to accomplish everything I’d want the
spacy_doc for before including the second
SpacyNLP component. For instance if I used
- name: 'SpacyNLP' model: 'en_core_web_md' - name: 'SpacyTokenizer' #is this necessary anymore? - name: 'SpacyFeaturizer' - name: 'SpacyNLP' model: 'my_other_model' - name: 'SpacyTokenizer' - name: 'SpacyFeaturizer'
My best guess is that the above pipeline would work, and anything relying on
"tokens" would get the second
my_other_model's tokens. But that you’d featurize the document vectors for both successfully.
Other than being a memory-glutton, is there anything else wrong with that? Is there any appetite for supporting a cleaner interface for that?