Skip to content

CSE234: Data Systems for Machine Learning

Course Overview

  • University: UCSD
  • Prerequisites: Linear Algebra, Deep Learning, Operating Systems, Computer Networks, Distributed Systems
  • Programming Languages: Python, Triton
  • Difficulty: 🌟🌟🌟
  • Estimated Workload: ~120 hours

This course focuses on the design of end-to-end large language model (LLM) systems, serving as an introductory course to building efficient LLM systems in practice.

The course can be more accurately divided into three parts (with several additional guest lectures):

Part 1. Foundations: modern deep learning and computational representations

  • Modern deep learning and computation graphs (framework and system fundamentals)
  • Automatic differentiation and an overview of ML system architectures
  • Tensor formats, in-depth matrix multiplication, and hardware accelerators

Part 2. Systems and performance optimization: from GPU kernels to compilation and memory

  • GPUs and CUDA (including basic performance models)
  • GPU matrix multiplication and operator-level compilation
  • Triton programming, graph optimization, and compilation
  • Memory management (including practical issues and techniques in training and inference)
  • Quantization methods and system-level deployment

Part 3. LLM systems: training and inference

  • Parallelization strategies: model parallelism, collective communication, intra-/inter-op parallelism, and auto-parallelization
  • LLM fundamentals: Transformers, Attention, and MoE
  • LLM training optimizations (e.g., FlashAttention-style techniques)
  • LLM inference: continuous batching, paged attention, disaggregated prefill/decoding
  • Scaling laws

(Guest lectures cover topics such as ML compilers, LLM pretraining and open science, fast inference, and tool use and agents, serving as complementary extensions.)

The defining characteristic of CSE234 is its strong focus on LLM systems as the core application setting. The course emphasizes real-world system design trade-offs and engineering constraints, rather than remaining at the level of algorithms or API usage. Assignments often require students to directly confront performance bottlenecks—such as memory bandwidth limitations, communication overheads, and kernel fusion—and address them through Triton or system-level optimizations. Overall, the learning experience is fairly intensive: a solid background in systems and parallel computing is important. For self-study, it is strongly recommended to prepare CUDA, parallel programming, and core systems knowledge in advance; otherwise, the learning curve becomes noticeably steep in the later parts of the course, especially around LLM optimization and inference. That said, once the pace is manageable, the course offers strong long-term value for those pursuing work in LLM infrastructure, ML systems, or AI compilers.

The course itself is relatively well-structured and progressive. However, for students without prior experience in systems and parallel computing, the transition into the second part of the course may feel somewhat steep. A key aspect of this course is spending significant time implementing and optimizing systems in practice. Therefore, it is highly recommended to explore relevant open-source projects on GitHub while reading papers, and to implement related systems or kernels hands-on to deepen understanding.

  • Foundations: consider studying alongside open-source projects such as micrograd
  • Systems & performance optimization and LLM systems: consider pairing with projects such as nanoGPT and nano-vllm

The course website itself provides a curated list of additional references and materials, which can be found here:
Book-related documentation and courses

Course Resources

  • Course Website: https://hao-ai-lab.github.io/cse234-w25/
  • Lecture Videos: https://hao-ai-lab.github.io/cse234-w25/
  • Reading Materials: https://hao-ai-lab.github.io/cse234-w25/resources/
  • Assignments: https://hao-ai-lab.github.io/cse234-w25/assignments/

Resource Summary

All course materials are released in open-source form. However, the online grading infrastructure and reference solutions for assignments have not been made public.

Additional Resources / Further Reading

  • GPUMode: offers in-depth explanations of GPU kernels and systems. Topics referenced in the course—such as DistServe, FlashAttention, and Triton—all have excellent extended talks available.