CSE234: Data Systems for Machine Learning

Course Overview

University: UCSD
Prerequisites: Linear Algebra, Deep Learning, Operating Systems, Computer Networks, Distributed Systems
Programming Languages: Python, Triton
Difficulty: 🌟🌟🌟
Estimated Workload: ~120 hours

This course focuses on the design of end-to-end large language model (LLM) systems, serving as an introductory course to building efficient LLM systems in practice.

The course can be more accurately divided into three parts (with several additional guest lectures):

Part 1. Foundations: modern deep learning and computational representations

Modern deep learning and computation graphs (framework and system fundamentals)
Automatic differentiation and an overview of ML system architectures
Tensor formats, in-depth matrix multiplication, and hardware accelerators

Part 2. Systems and performance optimization: from GPU kernels to compilation and memory

GPUs and CUDA (including basic performance models)
GPU matrix multiplication and operator-level compilation
Triton programming, graph optimization, and compilation
Memory management (including practical issues and techniques in training and inference)
Quantization methods and system-level deployment

Part 3. LLM systems: training and inference

Parallelization strategies: model parallelism, collective communication, intra-/inter-op parallelism, and auto-parallelization
LLM fundamentals: Transformers, Attention, and MoE
LLM training optimizations (e.g., FlashAttention-style techniques)
LLM inference: continuous batching, paged attention, disaggregated prefill/decoding
Scaling laws

(Guest lectures cover topics such as ML compilers, LLM pretraining and open science, fast inference, and tool use and agents, serving as complementary extensions.)

The defining characteristic of CSE234 is its strong focus on LLM systems as the core application setting. The course emphasizes real-world system design trade-offs and engineering constraints, rather than remaining at the level of algorithms or API usage. Assignments often require students to directly confront performance bottlenecks—such as memory bandwidth limitations, communication overheads, and kernel fusion—and address them through Triton or system-level optimizations. Overall, the learning experience is fairly intensive: a solid background in systems and parallel computing is important. For self-study, it is strongly recommended to prepare CUDA, parallel programming, and core systems knowledge in advance; otherwise, the learning curve becomes noticeably steep in the later parts of the course, especially around LLM optimization and inference. That said, once the pace is manageable, the course offers strong long-term value for those pursuing work in LLM infrastructure, ML systems, or AI compilers.

Recommended Learning Path

The course itself is relatively well-structured and progressive. However, for students without prior experience in systems and parallel computing, the transition into the second part of the course may feel somewhat steep. A key aspect of this course is spending significant time implementing and optimizing systems in practice. Therefore, it is highly recommended to explore relevant open-source projects on GitHub while reading papers, and to implement related systems or kernels hands-on to deepen understanding.

Foundations: consider studying alongside open-source projects such as micrograd
Systems & performance optimization and LLM systems: consider pairing with projects such as nanoGPT and nano-vllm

The course website itself provides a curated list of additional references and materials, which can be found here:
Book-related documentation and courses

Course Resources

Course Website: https://hao-ai-lab.github.io/cse234-w25/
Lecture Videos: https://hao-ai-lab.github.io/cse234-w25/
Reading Materials: https://hao-ai-lab.github.io/cse234-w25/resources/
Assignments: https://hao-ai-lab.github.io/cse234-w25/assignments/

Resource Summary

All course materials are released in open-source form. However, the online grading infrastructure and reference solutions for assignments have not been made public.

Additional Resources / Further Reading

GPUMode: offers in-depth explanations of GPU kernels and systems. Topics referenced in the course—such as DistServe, FlashAttention, and Triton—all have excellent extended talks available.