Zero Knowledge Notes on

As I understand it, as a person with nearly zero knowledge, transformers are the ML/AI architecture that got us to LLMs. Also as I understand it, their origin is in the primary contribution from the Google AI paper Attention is All You Need. This page will serve as the central place for anything that I—so therefore possibly you—need to know to understand their motivation, function, and caveats.

The abstract is straight-up offensive:

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder.

(emphasis is mine, I have no idea what those words in that…sequence…mean)

Zero Knowledge Notes on "Attention is All You Need"