Preparing data for training - single/multi threaded

Hi! Please, give some hints about what is done inside a process that goes after you run train command and you get console output:

Processed story block […] 5 it/s

? It turns out, that each step like above is done within minutes for 1000+ stories. And this process uses only one CPU on a 80 CPUs server.

Yes, I’m aware of solution Multi-thread training and I do understand, that preparation of data goes in a single thread, but tensor-flow tasks run in multiple threads by default.

I just want to understand what is done while preparing data and is it possible in any case speed it up by forcing to use multiple CPUs/threads.

Hi! A week passed. Any tips are still appreciated!

Any updates?