Does DIET architecture work effective with a larger dataset?

Hi, I was working a training an NLU model with a dataset consists of 60k utterances with around 400+ intents. I was training this using DIET architecture on p2.16xlarge AWS instance using the diet-heavy.yml config provided here(DIET Benchmarks · GitHub) and used the below command to start the training.

rasa test nlu --config configs/diet-heavy.yml --cross-validation --runs 1 --folds 2 --out results/diet-heavy

After running this command, I’ve observed that the epochs progress stuck at zero and don’t progress further for a long time.

Is the diet architecture in rasa is compatible with larger datasets?

If its possible, can you help me out in figuring out my mistake?

Thank you.

Could you share your pipeline configuration as well as the output that you see from the terminal? You should still see something of a progress bar even if it is slow.

Also, did you run the lightweight variants as well before running the heavy one?

The heavy settings that you are running are likely running more than just DIET. You’re also running BERT and this can certainly be something that is heavy in production. DIET is designed to also be able to handle larger datasets but I would argue that 400+ intents is a lot. On the intents … just to check … what kind of use-case do you have here? Frequently asked questions?