A few years ago I went to a concert in what was then called The Hospital Club (now the h Club) in London. It’s got nothing to do with a hospital, other than that the site used to be a hospital back in the 18th century — nowadays it’s a members club for the creative industries. I didn’t really know what to expect, but something about the ticket told me that it wasn’t going to be an ordinary classical concert.
I can’t quite remember what the name of the orchestra was, unfortunately, but the setup was as follows: small stage…
“I cannot tell you how happy I am that I have taken up drawing again. I had been thinking about it for a long time, but always considered it impossible and beyond my abilities”
The above quote is taken from a letter written by Vincent van Gogh to his younger brother Theo in 1880.¹ 1880 marked the year when van Gogh started his (short) painting career — he was 27 years old at the time.
I picked this quote because it really sums up what this article is about: whatever your age or background or what your Arts Teachers told…
Finetuning pretrained language models like BERT on downstream tasks has become ubiquitous in NLP research and applied NLP. That’s in part because one can save a lot of time and money by using pretrained models. They also often serve as strong baseline models which, when finetuned, significantly outperform training models from scratch.
While finetuning BERT is relatively straightforward in theory, it can be time-intensive and unrewarding in practice due to seemingly random outcomes of different training runs. In fact, even when finetuning a model with the same hyperparameters over and over again, there can be a great degree of variability…
For over 2 years now, transformer models, pretrained on large corpora of text, are the state-of-the-art in all things NLP. Researchers and practitioners continue to push boundaries by inventing better architectures or training larger models on more data. Indeed few would disagree that, all else equal, training larger models on more data increases performance. But what if one is time- or resource-constrained?
Common wisdom is to take the hit in accuracy, and train smaller models. Not only are smaller models faster to train and to do inference, it’s also cheaper, right?
Machine Learning Engineer. Be curious, stay optimistic.