A brief history of standing on the shoulders of giants

Photo by Kwinten De Pauw on Unsplash

A few years ago I went to a concert in what was then called The Hospital Club (now the h Club) in London. It’s got nothing to do with a hospital, other than that the site used to be a hospital back in the 18th century — nowadays it’s a members club for the creative industries. I didn’t really know what to expect, but something about the ticket told me that it wasn’t going to be an ordinary classical concert.

I can’t quite remember what the name of the orchestra was, unfortunately, but the setup was as follows: small stage…

Why you think you can’t draw and what to do about it

Photo by Fabrizio Conti on Unsplash

“I cannot tell you how happy I am that I have taken up drawing again. I had been thinking about it for a long time, but always considered it impossible and beyond my abilities”

The above quote is taken from a letter written by Vincent van Gogh to his younger brother Theo in 1880.¹ 1880 marked the year when van Gogh started his (short) painting career — he was 27 years old at the time.

I picked this quote because it really sums up what this article is about: whatever your age or background or what your Arts Teachers told…

Resource Constrained BERT Finetuning

Weight Initializations, Data Orders, and Early Stopping

Photo by Drew Patrick Miller on Unsplash

Finetuning pretrained language models like BERT on downstream tasks has become ubiquitous in NLP research and applied NLP. That’s in part because one can save a lot of time and money by using pretrained models. They also often serve as strong baseline models which, when finetuned, significantly outperform training models from scratch.

While finetuning BERT is relatively straightforward in theory, it can be time-intensive and unrewarding in practice due to seemingly random outcomes of different training runs. In fact, even when finetuning a model with the same hyperparameters over and over again, there can be a great degree of variability…

Train Large, Then Compress

How to train faster, higher performant transformers

Photo by Samule Sun on Unsplash

For over 2 years now, transformer models, pretrained on large corpora of text, are the state-of-the-art in all things NLP. Researchers and practitioners continue to push boundaries by inventing better architectures or training larger models on more data. Indeed few would disagree that, all else equal, training larger models on more data increases performance. But what if one is time- or resource-constrained?

Common wisdom is to take the hit in accuracy, and train smaller models. Not only are smaller models faster to train and to do inference, it’s also cheaper, right?

Recent research by Berkeley Artificial Intelligence Research (BAIR)¹ suggests…

Jonas Vetterle

Machine Learning Engineer. Be curious, stay optimistic.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store