How to train faster, higher performant transformers — For over 2 years now, transformer models, pretrained on large corpora of text, are the state-of-the-art in all things NLP. Researchers and practitioners continue to push boundaries by inventing better architectures or training larger models on more data. Indeed few would disagree that, all else equal, training larger models on…