Embedding policy training time

Hello,

So I’ve just started playing around with the embedding policy. My previous agent was trained on memoization and the keras policies. Now I’ve decided to switch the Keras for the embedding policy but training is taking a lot longer to finish. Is this training policy expected to be much slower than the keras policy? I haven’t really went in depth to understand the architecture of each policy, but before I’d just like to know for now if this behavior is expected. I’m currently running tensorflow on a old rusty cpu but because i have a relatively small dataset, the keras policy would take less than five minutes to train. But now more than an hour and a half have passed and i’m still at 21% with the embedding policy. Is this normal?

This is my code to train the agent

agent = Agent(‘domain.yml’, policies=[MemoizationPolicy(max_history=4), EmbeddingPolicy()])
data = agent.load_data(training_data_file)
agent.train(data, augmentation_factor=10, epochs=200, batch_size=50, validation_split=0.2)

Yes the embedding policy takes a lot longer to train, this is expected. It’s a much more complex architecture. How much training data do you have?

barely anything, i just tested it out on 10 stories. Never enough for such a complex model i suppose. It took 9 hours to train.

It took 9 hours to train on 10 stories? That can’t be right. What config did you use and what kind of machine are you training on?

@akelad: Any recommendations on parameters? Also only made it to 1% after 2 hours with 171hrs remaining.

Also having a memory issue: Allocation of 80600000 exceeds 10% of system memory.

Training locally on a MacBook Pro '16 with 16 GB RAM. Any chance I find settings to test this efficiently?

That’s how I defined the policy atm according to what I read here and here:

policies:
  - name: "EmbeddingPolicy"
    epochs: 2000
    attn_shift_range: 2
    featurizer:
    - name: FullDialogueTrackerFeaturizer
      state_featurizer:
      - name: LabelTokenizerSingleStateFeaturizer

@Sam: Found the problem. I still had augmentation_factor set to a rather large value. If I set it to 0 as shown in the GitHub repo, it completes almost instantly.

@akelad: Should augmentation always be 0 for Embedding Policy?

1 Like

Yes, The EmbeddingPolicy memorises full stories, so it should always be set to 0

@smn-snkl @akelad Hey, how do you set the augmentation_factor to 0 ? I use a config file which looks like this :

policies:
  - name: EmbeddingPolicy
    epochs: 1000
    attn_shift_range: 3
    featurizer:
      - name: FullDialogueTrackerFeaturizer
        state_featurizer:
          - name: LabelTokenizerSingleStateFeaturizer

for comparing policy and it takes also a lot of time (0% after 30 minutes, with less than 50 stories).

Edit : I add --augmentation 0 at the end of my compare command, and it’s way quicker but for it’s settign augmentation 0 for both keras and embedding. Is there a way to set the augmentation to 0 only for the embedding policy ?

@huberrom when invoking the training command, pass a --augmentation flag, e.g. in this case --augmentation 0

Yup, see my edit, I found the --augmentation flag, but it’s used for both the keras and the embedding police

I’m not sure if it’s a good idea to use Keras and Embedding. I’d probably cross-validate them with different hyperparameters and go with your best model.

i’m not using both of them, I was comparing them using rasa_core.train compare. But yeah, I could have train compare keras with different parameters, and then embedding, chose the best model from both of them, and compare them using rasa_core.evaluate compare

@huberrom I think you made a good point though. I just realized that your problem also affects using the EmbeddingPolicy with other policies in general. E.g. the memoization policy performs really bad with Embedding policies, since augmentation needs to be 0.

@akelad: Any solution how one can set augmentation to 0 for EmbeddingPolicy, but to something else for MemoizationPolicy?

1 Like

Currently there isn’t a way to set the augmentation factor for a specific policy. But we’ll probably make it so that augmentation doesn’t get passed onto memoization policies in the future. Because augmentation for those policies can actually be harmful. It’s only really useful for the KerasPolicy

This needs a complete rework of the MemoizationPolicy then, right? Because right now MemoizationPolicy heavily relies on augmentation to work, requiring every story to have at least max_history steps. Is the goal to fix this “issue”? (Which I would love to see :slight_smile: )

If I understand correctly your answer @akelad, the memoization doesn’t need augmentation ? So it’s currently fine if I use the embedding policy + memoization + augmentation flag to 0 ?

Because it looks like it’s not working well for @smn-snkl, are you sure it’s linked with the augmentation ?

@smn-snkl no it doesn’t rely on augmentation, the idea behind the memo policy is that it memorises the stories you’ve written, and not some artificially augmented storries. Keras is the one that relies on augmentation :slight_smile: we’re currently working on a PR where the augmented stories will only get passed to the KerasPolicy, so we can avoid Memo policies learning some undesired behaviour.

@huberrom yes that’s totally fine, in fact the embedding policy doesn’t work with augmentation at all

HI Guys,

So we’ve been putting off running the embedding policy for a while as we werent able to train it in a reasonable space of time. I now want to adddress this.

On Keras Policy the core training with 500 epoch takse 2 hours. With the Embedding Policy its is at 33 hours.

I have set augmentation to 0

Our stories file is approx 20k lines of stories.

Am I missing something or is that expected