Learn how to make BERT smaller and faster

Hey everyone,

We’ve released a new blog post about compressing huge neural language models such as BERT: Learn how to make BERT smaller and faster.

Even though the blog post is aimed primarily at ML researchers and practitioners, the topic is very much relevant to everyone who wants to use today’s best-performing language models for tasks like intent classification.

If you’ve got any questions, ideas, comments, post them here! :slight_smile:


Will your next post be on pruning etc, I would love to see how I can implement this directly into my Rasa tests. I have been learning a lot from directly using Rasa, fixing issues, learning as I go. So it would be cool to see how I can utilize this

Great post!

Hey @FelixKing, thanks!

I have tried weight pruning and am currently exploring neuron pruning. The results look promising so far, though any inference time improvements are still to be measured. If you’re not afraid of dirty code, you can just watch my branch of the rasa repo for live updates :wink:

What is your motivation when it comes to model compression? Do you just want to see if big models can be made faster, or is it something else that’s driving your interest?