Pre-load vectors for Tensorflow ranking classifier pipeline

souvikg10 · August 3, 2018, 7:41am

@amn41 Hi, My question is linked to how to use tensor flow ranking with preloaded word vectors from spaCy.

pipeline: 
- name: "tokenizer_whitespace"
- name: "intent_entity_featurizer_regex"
- name: "ner_crf"
- name: "ner_synonyms"
- name: "intent_featurizer_spacy"
- name: "intent_classifier_tensorflow_embedding"
  intent_tokenization_flag: true
  intent_split_symbol: "+"
- name: "ner_duckling_http"
  url: "http://0.0.0.0:8000"
  locale: "NL_Nothing"

Will this pipeline works, I understand that tensor flow requires text features and hence instead of using the count vector featurizer , I replaced it with spacY’s that pre-loads the vectors.

Does this makes sense?

therold · August 3, 2018, 9:12am

I think what you are trying to do should work. I have tried this in the past and it sort of works. You can combine the spacy word vectors and with your own (see pipeline below)

There are a few things that you should keep in mind:

The TF embedding classifier’s default parameters are chosen to work best with the CountVectorFeaturizer. Make sure to tweak the hyperparameters if you only use the spacy vectors.
When combining spacy vectors with the CountVectorFeaturizer make sure to put the CountVectorFeaturizer in the pipeline first. Otherwise it will not featurize the original, raw words but rather the lemmata generated by spacy. This defeats the purpose of training your own vectors on your custom, domain-specific vocabulary a bit.
Spacy’s word vectors have a length of ca. 350. The CountVectorFeaturizer typically produces word vectors with a lot longer shapes, typically a few thousand in length.
In our testing, we did not see any improvements when combining spacy + CountVectorFeaturizer + TF embedding classifier. On our dataset F1 dropped from by 3-4 percentage points. Your milage may vary.

pipeline:
- name: "intent_featurizer_count_vectors"
- name: "nlp_spacy"
- name: "tokenizer_spacy"
- name: "intent_featurizer_spacy"
- name: "intent_classifier_tensorflow_embedding"

souvikg10 · August 3, 2018, 9:28am

Interesting, Thanks for the useful insight. I will test it out. We actually have our custom spaCy’s vectors which performs quite well with SVM. However given some of the advantages of tensorflow ranker with OOV and multi intents, it was worth an idea to mix both and see the results.

therold · August 3, 2018, 9:40am

Let me know the results and the parameters you used. I’d be interested in your findings.

parthsharma1996 · September 1, 2018, 9:28am

I was looking this exact thing. Has someone figured out a standard way of doing this that works? It would be extremely helpful to us since our data set is quite small

Topic		Replies	Views
Custom word vectors and pipelines using them Rasa Open Source	5	1079	March 20, 2020
spaCy and OOV-Tokens Rasa Open Source	1	632	July 29, 2020
Intent Classifier TensorFlow Embedding with SpacyFeaturizer Rasa Open Source	1	1076	October 15, 2019
Easiest way to finetune word vectors Rasa Open Source	3	1311	June 3, 2021
Pre-Trained vectors for tensorflow pipeline Rasa Open Source	8	1145	September 4, 2018

Pre-load vectors for Tensorflow ranking classifier pipeline

Related topics