Custom Featurizer for finetuned BERT features based on SpaCy

Hi @dakshvar22,

okay got it - so you mean that this currently only works because the list of response-examples is empty and as soon as there would be content, I would disobey the order?

Since this is kind of a showstopper or one of my bots which relies on absolute high accuracy thus was trained with BERT embeddings, I thought about several scenarios to avoid this behaviour.

The training process actually fails here:

 File "c:\users\\appdata\local\programs\python\python36\lib\site-packages\rasa\nlu\utils\spacy_utils.py", line 145, in <listcomp>
    docs = [doc for doc in self.nlp.pipe(texts, batch_size=50)]
  File "c:\users\\appdata\local\programs\python\python36\lib\site-packages\spacy\language.py", line 752, in pipe
    for doc in docs:
  File "pipes.pyx", line 941, in pipe
  File "c:\users\\appdata\local\programs\python\python36\lib\site-packages\spacy\util.py", line 463, in minibatch
    batch = list(itertools.islice(items, int(batch_size)))

because the pipe actually used resides in the transformers library and is defined as:

def pipe(self, stream, batch_size=128):
    """Process Doc objects as a stream and assign the extracted features.

    stream (iterable): A stream of Doc objects.
    batch_size (int): The number of texts to buffer.
    YIELDS (spacy.tokens.Doc): Processed Docs in order.
    """
    for docs in minibatch(stream, size=batch_size):
        docs = list(docs)
        outputs = self.predict(docs)
        self.set_annotations(docs, outputs)
        for doc in docs:
            yield doc

So one way would maybe be to handle things here. Any ideas?

Regards and thanks for your help