Transformers were first introduced and quickly rose to prominence as the primary architecture in natural language processing. More lately, they have gained immense popularity in computer vision as well. Dosovitskiy et al. demonstrated how to create effective image classifiers that beat CNN-based architectures at high model and data scales by dividing pictures into sequences of…
