Zero Knowledge Notes on "Attention is All You Need"

The paper that could write itself.
published: 2024-05-17 | generated: 2024-06-04 | tagged: , , , ,
ⓘ Note

This page is a WIP, and I’m still very much learning these things. If you’re here, take what you find below with a grain of salt. If, miraculously, my confidence of what is found on this page goes up, I’ll remove this note.

As I understand it, as a person with nearly zero knowledge, transformers are the ML/AI architecture that got us to LLMs. Also as I understand it, their origin is in the primary contribution from the Google AI paper Attention is All You Need. This page will serve as the central place for anything that I—so therefore possibly you—need to know to understand their motivation, function, and caveats.


The abstract is straight-up offensive:

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder.

(emphasis is mine, I have no idea what those words in that…sequence…mean)