The amount of generated training data

I have a bunch of intents defined in my nlu.md. The file is 241 lines long. I also have a domain.yml file and a stories.md file.

I then hit rasa train on the terminal and things start running. This is the first few lines;

Epoch 1/100
763/763 [==============================] - 1s 736us/sample - loss: 2.6756 - acc: 0.3486
Epoch 2/100
763/763 [==============================] - 0s 168us/sample - loss: 2.2360 - acc: 0.5007
Epoch 3/100
763/763 [==============================] - 0s 166us/sample - loss: 1.8602 - acc: 0.5007
Epoch 4/100
763/763 [==============================] - 0s 171us/sample - loss: 1.7426 - acc: 0.5007
Epoch 5/100
763/763 [==============================] - 0s 166us/sample - loss: 1.6763 - acc: 0.5007

I understand that these lines of code are the output from keras but I wonder … where does the number 763 come from?

@koaning That should be the number of mini-batch iterations for each epoch of Rasa Core training.

Thanks for the response!

That sounds a bit strange though. There are more mini batches than data points going into the model?

To anybody else interested in this, the “extra” datapoints are generated by rasa by merit of the stories. It is not just the intents we’re trying to learn from the data, we’re also trying to get the intents from the past history of the stories that we’ve seen.