CMU 11-868: Large Language Model Systems
Course Overview
- University: Carnegie Mellon University
- Prerequisites: Strongly recommended to have taken Deep Learning (11-785) or Advanced NLP (11-611 or 11-711)
- Programming Language: Python
- Course Difficulty: 🌟🌟🌟🌟
- Estimated Workload: 120 hours
This graduate-level course focuses on the full stack of large language model (LLM) systems — from algorithms to engineering. The curriculum covers, but is not limited to:
- GPU Programming and Automatic Differentiation: Master CUDA kernel calls, fundamentals of parallel programming, and deep learning framework design.
- Model Training and Distributed Systems: Learn efficient training algorithms, communication optimizations (e.g., ZeRO, FlashAttention), and distributed training frameworks like DDP, GPipe, and Megatron-LM.
- Model Compression and Acceleration: Study quantization (GPTQ), sparsity (MoE), compiler technologies (JAX, Triton), and inference-time serving systems (vLLM, CacheGen).
- Cutting-Edge Topics and Systems Practice: Includes retrieval-augmented generation (RAG), multimodal LLMs, RLHF systems, and end-to-end deployment, monitoring, and maintenance.
Compared to similar courses, this one stands out for its tight integration with recent papers and open-source implementations (hands-on work expanding CUDA support in the miniTorch framework), a project-driven assignment structure (five programming assignments + a final project), and guest lectures from industry experts, offering students real-world insights into LLM engineering challenges and solutions.
Self-Study Tips:
- Set up a CUDA-compatible environment in advance (NVIDIA GPU + CUDA Toolkit + PyTorch).
- Review fundamentals of parallel computing and deep learning (autograd, tensor operations).
- Carefully read the assigned papers and slides before each lecture, and follow the assignments to extend the miniTorch framework from pure Python to real CUDA kernels.
This course assumes a solid understanding of deep learning and is not suitable for complete beginners. See the FAQ for more on prerequisites.
The assignments are fairly challenging and include:
- Assignment 1: Implement an autograd framework + custom CUDA ops + basic neural networks
- Assignment 2: Build a GPT2 model from scratch
- Assignment 3: Accelerate training with custom CUDA kernels for Softmax and LayerNorm
- Assignment 4: Implement distributed model training (difficult to configure independently for self-study)
Course Resources
- Course Website: https://llmsystem.github.io/llmsystem2025spring/
- Syllabus: https://llmsystem.github.io/llmsystem2025spring/docs/Syllabus/
- Assignments: https://llmsystem.github.io/llmsystem2025springhw/
- Course Texts: Selected research papers + selected chapters from Programming Massively Parallel Processors (4th Edition)