Let's build GPT

listen to works

Spotify

Listen on Spotify

Apple Music

Listen on Apple Music

YouTube Music

Listen on YouTube Music

reading and exploring the data

port our code to a script

residual connections

feedforward layers of transformer block

scaling up the model! creating a few variables. adding dropout

training the bigram model

layernorm (and its relationship to our previous batchnorm)

from scratch, in code, spelled out.

conclusions

encoder vs. decoder vs. both (?) Transformers