Hello. I read paper about REDP (https://arxiv.org/pdf/1811.11707.pdf). According to section 3 RNN part, the output of lstm is fed to embedding layer and then the sum of it and system attention vector is used as dialogue state embedding.
(The output of this cell is fed to another embedding layer to create an embedding of the cell output for the current time step. The sum of this embedded cell output and system attention vector is used as the dialogue state embedding.)
However, in the Figure 2, the output of lstm is added to system attention vector and then fed to embedding layer, which is the purple box on the far right.
Am i misunderstanding something?
Thanks in advance.