Hi @dakshvar22,
okay got it - so you mean that this currently only works because the list of response-examples is empty and as soon as there would be content, I would disobey the order?
Since this is kind of a showstopper or one of my bots which relies on absolute high accuracy thus was trained with BERT embeddings, I thought about several scenarios to avoid this behaviour.
The training process actually fails here:
File "c:\users\\appdata\local\programs\python\python36\lib\site-packages\rasa\nlu\utils\spacy_utils.py", line 145, in <listcomp>
docs = [doc for doc in self.nlp.pipe(texts, batch_size=50)]
File "c:\users\\appdata\local\programs\python\python36\lib\site-packages\spacy\language.py", line 752, in pipe
for doc in docs:
File "pipes.pyx", line 941, in pipe
File "c:\users\\appdata\local\programs\python\python36\lib\site-packages\spacy\util.py", line 463, in minibatch
batch = list(itertools.islice(items, int(batch_size)))
because the pipe actually used resides in the transformers library and is defined as:
def pipe(self, stream, batch_size=128):
"""Process Doc objects as a stream and assign the extracted features.
stream (iterable): A stream of Doc objects.
batch_size (int): The number of texts to buffer.
YIELDS (spacy.tokens.Doc): Processed Docs in order.
"""
for docs in minibatch(stream, size=batch_size):
docs = list(docs)
outputs = self.predict(docs)
self.set_annotations(docs, outputs)
for doc in docs:
yield doc
So one way would maybe be to handle things here. Any ideas?
Regards and thanks for your help