Loading detailsβ¦
Loading detailsβ¦
Artist
reading and exploring the data
552port our code to a script
293residual connections
244feedforward layers of transformer block
235scaling up the model! creating a few variables. adding dropout
236training the bigram model
237layernorm (and its relationship to our previous batchnorm)
228from scratch, in code, spelled out.
219conclusions
2110encoder vs. decoder vs. both (?) Transformers
20