CMU 15-779: Advanced Topics in Machine Learning Systems (LLM Edition)
Course Overview
- University: Carnegie Mellon University
- Prerequisites: No strict prerequisites; an intro ML background and hands-on deep learning training experience are recommended; familiarity with PyTorch helps; basic CUDA/GPU knowledge will significantly improve the learning curve
- Programming Language: Python (systems and kernel-level topics involve CUDA/hardware concepts)
- Course Difficulty: 🌟🌟🌟🌟
- Estimated Study Hours: 80-120 hours
This course takes a systems-first view of modern machine learning and LLM infrastructure. The core question it repeatedly answers is: how does a model written in a high-level framework (e.g., PyTorch) get decomposed into low-level kernels, and how is it executed efficiently on heterogeneous accelerators (GPUs/TPUs) and in distributed environments. The syllabus covers GPU programming, ML compilers, graph-level optimizations, distributed training and auto-parallelization, and LLM serving and inference acceleration. It is a strong fit if you want to connect “framework-level experience” with “kernels, compilation, hardware, and cluster execution.”
The workload is organized around consistent pre-lecture reading assignments (paper reviews) and a team-based final course project (proposal, presentation, report). For self-study, it is best to follow the schedule week by week rather than treating it as a slide-only course.
Topics Covered
The course is structured as lectures, with major themes including:
- ML systems fundamentals via TensorFlow/PyTorch (abstractions, execution models)
- GPU architecture and CUDA programming (memory, performance tuning)
- Transformer and attention case studies (FlashAttention and IO-aware attention)
- Advanced CUDA techniques (warp specialization, mega kernels)
- ML compilation (tile-based DSLs like Triton, kernel auto-tuning, graph-level optimizations, superoptimization such as Mirage)
- Parallelization and distributed training (ZeRO/FSDP, model/pipeline parallelism, auto-parallelization such as Alpa)
- LLM serving and inference (batching, PagedAttention, RadixAttention, speculative decoding)
- Post-training and architectures (PEFT like LoRA/QLoRA, MoE architectures/kernels/parallelism)
Course Resources
- Course Website: https://www.cs.cmu.edu/~zhihaoj2/15-779/
- Schedule (slides and reading list per lecture): https://www.cs.cmu.edu/~zhihaoj2/15-779/schedule.html
- Slides (PDF): https://www.cs.cmu.edu/~zhihaoj2/15-779/slides/
- Logistics (grading, paper reviews, course project): https://www.cs.cmu.edu/~zhihaoj2/15-779/logistics.html
- Materials (intro deep learning materials): https://www.cs.cmu.edu/~zhihaoj2/15-779/materials.html