Embedding policy & story processing time explosion

luke-jordan · December 12, 2018, 2:35pm

Hi,

We are trying to switch over from the LSTM to the Embedding policy. We’re hitting an issue with extremely long & compute intensive story processing / generator times - i.e., prior to actual training.

We have definitely set the augmentation factor to 0. In debug logs we can see the generator is setting the number of augmentation rounds to 0. But then several rounds run anyway - looks like this is to do with the everything_reachable switch in the generator - and by the second round we have an exponential explosion in the number of trackers. For example:

RASA-LOGS] 2018-12-12 16:24:29,499 [MainThread ] [DEBUG] Number of augmentation rounds is 0 [RASA-LOGS] 2018-12-12 16:24:29,499 [MainThread ] [DEBUG] Starting data generation round 0 … (with 1 trackers) [RASA-LOGS] 2018-12-12 16:25:45,318 [MainThread ] [DEBUG] Finished phase (35450 training samples found). [RASA-LOGS] 2018-12-12 16:25:45,460 [MainThread ] [DEBUG] Found 37 unused checkpoints in current phase. [RASA-LOGS] 2018-12-12 16:25:45,461 [MainThread ] [DEBUG] Found 103212 active trackers for these checkpoints. [RASA-LOGS] 2018-12-12 16:25:45,462 [MainThread ] [DEBUG] Starting data generation round 1 … (with 103212 trackers)

What is quite strange is that we do not have this issue with the LSTM policy and by all investigations, this is happening prior to the training itself, i.e., it’s not all that clear to us why switching the policy should have this effect.

Any clues? We’ve been hunting through stacks quite a bit and are pretty stumped.

Best Luke

Ghostvv · December 12, 2018, 2:41pm

It is happening because Embedding policy uses different type of featurization. Do you have heavily checkpointed stories?

luke-jordan · December 12, 2018, 2:56pm

Ah, understood. Yes, we do. Is it preferred to use checkpoints sparingly when utilizing Embedding policy?

Ghostvv · December 12, 2018, 3:31pm

37 unused checkpoints after 1 generation round means that you have a lot of loops in your stories. Number of different stories is 103212, since it was faster for standard policy, I believe this stories are very similar. So it makes sense to clean them up

Topic		Replies	Views
Augmentation - Problems in training Rasa Open Source	1	1618	July 4, 2019
Embedding Policy Results Rasa Open Source	15	3399	July 19, 2019
Policies and stories problem Rasa Open Source	4	1392	July 13, 2019
Does rasa strictly follow stories Rasa Open Source	4	2123	October 10, 2018
Mapping Policy confused Keras Policy Rasa Open Source	8	988	July 11, 2019

Embedding policy & story processing time explosion

Related topics