Custom Featurizer for finetuned BERT features based on SpaCy

JulianGerhard · September 12, 2019, 11:50am

okay got it - so you mean that this currently only works because the list of response-examples is empty and as soon as there would be content, I would disobey the order?

Since this is kind of a showstopper or one of my bots which relies on absolute high accuracy thus was trained with BERT embeddings, I thought about several scenarios to avoid this behaviour.

The training process actually fails here:

 File "c:\users\\appdata\local\programs\python\python36\lib\site-packages\rasa\nlu\utils\spacy_utils.py", line 145, in <listcomp>
    docs = [doc for doc in self.nlp.pipe(texts, batch_size=50)]
  File "c:\users\\appdata\local\programs\python\python36\lib\site-packages\spacy\language.py", line 752, in pipe
    for doc in docs:
  File "pipes.pyx", line 941, in pipe
  File "c:\users\\appdata\local\programs\python\python36\lib\site-packages\spacy\util.py", line 463, in minibatch
    batch = list(itertools.islice(items, int(batch_size)))

because the pipe actually used resides in the transformers library and is defined as:

def pipe(self, stream, batch_size=128):
    """Process Doc objects as a stream and assign the extracted features.

    stream (iterable): A stream of Doc objects.
    batch_size (int): The number of texts to buffer.
    YIELDS (spacy.tokens.Doc): Processed Docs in order.
    """
    for docs in minibatch(stream, size=batch_size):
        docs = list(docs)
        outputs = self.predict(docs)
        self.set_annotations(docs, outputs)
        for doc in docs:
            yield doc

So one way would maybe be to handle things here. Any ideas?

Regards and thanks for your help

Topic		Replies	Views
Learn how to make BERT smaller and faster Tutorials, Resources & Videos	27	7259	November 28, 2019
Problem when using transformer in NLU pipeline Rasa Open Source	8	2216	November 25, 2021
How to train Rasa for other language Rasa Open Source	32	4985	August 25, 2020
Support for Language Models inside Rasa Release Announcements community , rasa	25	12824	November 25, 2021
KeyError: 'HFTransformersNLP' Rasa Open Source	1	410	November 22, 2021

Custom Featurizer for finetuned BERT features based on SpaCy

Related topics