Custom Featurizer for finetuned BERT features based on SpaCy

dakshvar22 · September 12, 2019, 9:23am

This was actually a tricky case in the implementation. The reason why non-existent attribute was given an empty string was that spacy doesn’t accept None as the input string. Since we are processing spacy docs for training examples in batch mode - self.nlp.pipe(texts, batch_size=50) (much faster to do it in batch mode), I replaced all None attributes with empty strings. It would be tedious and messy to filter examples with None value for an attribute and then merge an empty doc for them. The way how you have implemented it currently has a small bug -

docs = [doc for doc in self.nlp.pipe(texts, batch_size=50)]
attribute_docs[attribute] = docs

docs will be the spacy docs for a filtered set of texts, which means the order of the spacy docs of training examples would now be different from the order of training examples in training_data.training_examples . This would cause a problem here -

           for idx, example in enumerate(training_data.training_examples):
                example_attribute_doc = attribute_docs[attribute][idx]
                if len(example_attribute_doc):
                    # If length is 0, that means the initial text feature was None and was replaced by ''
                    # in preprocess method
                    example.set(
                        MESSAGE_SPACY_FEATURES_NAMES[attribute], example_attribute_doc
                    )

attribute_docs[attribute][idx] does not correspond to the correct spacy doc for training example at idx index inside training_data.training_examples.

I haven’t looked at spacy-pytorch-transformers library myself, but do you have any other idea to avoid this?

Topic		Replies	Views
Easiest way to finetune word vectors Rasa Open Source	3	1306	June 3, 2021
Learn how to make BERT smaller and faster Tutorials, Resources & Videos	27	7223	November 28, 2019
How to integrate BERT to the nlu model? Rasa Open Source	3	2057	December 28, 2019
Dense word-embeddings with RASA (spaCy) Rasa Open Source	4	930	February 4, 2021
What features does Rasa NLU use from spacy? Rasa Open Source	0	666	February 21, 2019

Custom Featurizer for finetuned BERT features based on SpaCy

Related topics