Skip to content

CMU 11-868: Large Language Model Systems

Course Overview

  • University: Carnegie Mellon University
  • Prerequisites: Strongly recommended to have taken Deep Learning (11-785) or Advanced NLP (11-611 or 11-711)
  • Programming Language: Python
  • Course Difficulty: 🌟🌟🌟🌟
  • Estimated Workload: 120 hours

This graduate-level course focuses on the full stack of large language model (LLM) systems — from algorithms to engineering. The curriculum covers, but is not limited to:

  1. GPU Programming and Automatic Differentiation: Master CUDA kernel calls, fundamentals of parallel programming, and deep learning framework design.
  2. Model Training and Distributed Systems: Learn efficient training algorithms, communication optimizations (e.g., ZeRO, FlashAttention), and distributed training frameworks like DDP, GPipe, and Megatron-LM.
  3. Model Compression and Acceleration: Study quantization (GPTQ), sparsity (MoE), compiler technologies (JAX, Triton), and inference-time serving systems (vLLM, CacheGen).
  4. Cutting-Edge Topics and Systems Practice: Includes retrieval-augmented generation (RAG), multimodal LLMs, RLHF systems, and end-to-end deployment, monitoring, and maintenance.

Compared to similar courses, this one stands out for its tight integration with recent papers and open-source implementations (hands-on work expanding CUDA support in the miniTorch framework), a project-driven assignment structure (five programming assignments + a final project), and guest lectures from industry experts, offering students real-world insights into LLM engineering challenges and solutions.

Self-Study Tips:

  • Set up a CUDA-compatible environment in advance (NVIDIA GPU + CUDA Toolkit + PyTorch).
  • Review fundamentals of parallel computing and deep learning (autograd, tensor operations).
  • Carefully read the assigned papers and slides before each lecture, and follow the assignments to extend the miniTorch framework from pure Python to real CUDA kernels.

This course assumes a solid understanding of deep learning and is not suitable for complete beginners. See the FAQ for more on prerequisites.

The assignments are fairly challenging and include:

  1. Assignment 1: Implement an autograd framework + custom CUDA ops + basic neural networks
  2. Assignment 2: Build a GPT2 model from scratch
  3. Assignment 3: Accelerate training with custom CUDA kernels for Softmax and LayerNorm
  4. Assignment 4: Implement distributed model training (difficult to configure independently for self-study)

Course Resources