The schedule is tentative and subjects to change (e.g. snow days)
Date |
Topic |
Notes |
Additional readings/references |
Class 1 (Sep 4) |
Course Overview, logistics |
|
|
Class 2 (Sep 8) |
Architectures, transformers, pipeline |
|
|
Class 3 (Sep 11) |
Attention, intro to GPU, flash attention |
|
|
Class 4 (Sep 15) |
Variants of attention for time/memory optimization |
|
|
Class 5 (Sep 18) |
Optimization, backpropagation for FFN, attention |
|
|
Class 6 (Sep 22) |
Stochastic gradient descent, AdaGrad, Adam |
|
|
Class 7 (Sep 25) |
Recent variants of optimizers (Muon, low rank) |
|
|
Class 8 (Sep 29) |
Parallel algorithms on GPU |
|
|
Class 9 (Oct 2) |
DeepSpeed/ZeRO, FSDP |
|
|
Class 10 (Oct 6) |
Locality sensitive hashing |
|
|
Class 11 (Oct 9) |
Kernel density estimation |
|
|
Oct 13 |
Indigenous Peoples Day, no classes |
|
|
Class 12 (Oct 16) |
Graph-based nearest neighbor search, RAG |
|
|
Class 13 (Oct 20) |
Hashing-based attention approximation |
|
|
Class 14 (Oct 23) |
Mixture of experts |
|
|
Class 15 (Oct 27) |
State space models |
|
|
Class 16 (Oct 30) |
Fine-tuning, PEFT |
|
Project proposal is due |
Class 17 (Nov 3) |
Fast inference |
|
|
Class 18 (Nov 6) |
Quantization (clustering, hashing e.g. RaBitQ) |
|
|
Class 19 (Nov 10) |
Quantization aware training, low bit quantization |
|
|
Class 20 (Nov 13) |
|
|
|
Class 21 (Nov 17) |
|
|
Project progress update due |
Class 22 (Nov 20) |
|
|
|
Class 23 (Nov 24) |
|
|
|
Nov 27 |
Thanksgiving, no classes |
|
|
Class 24 (Dec 1) |
|
|
|
Class 25 (Dec 4) |
|
|
|
Dec 8 |
time for project, no classes |
|
Final project due |
Dec 11 |
no classes |
|
|