Hi, My team is trying to replicate the TED model in pytorch, however we do have some problem about the perfomace of the model. We wonder if someone could tell us what hyperparameters they used at the time of the experiments that appear in the original paper. It seems to us a good model to which we want to make a couple of modifications
Hi @miguel-kjh, welcome to the Rasa community forum
For more details on the TED experiments, I would point you to this repository, which contains experimental files and the commits that were used for the paper.
In order to find out the config for a particular run, you would have to combine information from various places though (see the linked files for examples):
In the relevant git commit, you can find the default hyperparameters of the TED alogrithm, e.g. here.
In the config.yml file inside the experimental folder, you can see the custom set parameters that would overwrite the defaults.
Finally, the train.sh scripts show the command line arguments used for the experiments, pointing to the config file and datasets.
Thank you for the anwser . I finished the TED model in pytorch however I have some problem with the mask of transformer, I see that use in the encoder and by the error calculation but I not understant why. Beside in the paper the features are composte by the intent, the entities, the slots and the previus actions, but in code: Do you use something else?
For those who want to take a look at it or in case someone wants to use it, I leave the github.