Skip to content

CMU 15-445: Database Systems

Descriptions

  • Offered by: CMU
  • Prerequisites: C++, Data Structures and Algorithms
  • Programming Languages: C++
  • Difficulty: 🌟🌟🌟🌟
  • Class Hour: 100 hours

As an introductory course to databases at CMU, this course is taught by Andy Pavlo, a leading figure in the database field (quoted as saying, "There are only two things I care about in this world, one is my wife, the second is the database").

This is a high-quality, resource-rich introductory course to Databases.

The faculty and the CMU Database Group behind the course have open-sourced all the corresponding infrastructure (Autograder, Discord) and course materials (Lectures, Notes, Homework), enabling any student who is willing to learn about databases to enjoy an experience almost equivalent to that of a CMU student.

One of the highlights of this course is the relational database Bustub, which was specifically developed by the CMU Database Group for teaching purposes. It requires you to modify various components of this database and implement their functionalities.

Specifically, in 15-445, you will need to implement some key components in Bustub, a traditional disk-oriented relational database, through the progression of four Projects.

These components include the Buffer Pool Manager (for memory management), B Plus Tree (storage engine), Query Executors & Query Optimizer (operators & optimizer), and Concurrency Control, corresponding to Project #1 through Project #4.

Worth mentioning is that, during the implementation process, students can compile bustub-shell through shell.cpp to observe in real-time whether their implemented components are correct. The feedback is very sufficient.

Furthermore, as a medium-sized project written in C++, bustub covers many requirements such as program construction, code standards, unit testing, etc., making it an excellent open-source project for learning.

Resources

In Fall 2019, Project #2 involved creating a hash index, and Project #4 focused on logging and recovery.

In Fall 2020, Project #2 was centered on B-trees, while Project #4 dealt with concurrency control.

In Fall 2021, Project #1 required the creation of a buffer pool manager, Project #2 involved a hash index, and Project #4 focused on concurrency control.

In Fall 2022, the curriculum was similar to that of Fall 2021, with the only change being that the hash index was replaced by a B+ tree index, and everything else remained the same.

In Spring 2023, the overall content was largely identical to Fall 2022 (buffer pool, B+ tree index, operators, concurrency control), except Project #0 shifted to Copy-On-Write Trie. Additionally, a fun task of registering uppercase and lowercase functions was introduced, which allows you to see the actual effects of the functions you write directly in the compiled bustub-shell, providing a great sense of achievement.

It's important to note that the versions of bustub prior to 2020 are no longer maintained.

The last Logging & Recovery Project in Fall 2019 is broken (it may still run on the git head from 2019, but Gradescope doesn't provide a public version, so it is not recommended to work on it, it is sufficient to just review the code and handout).

Perhaps in the Fall 2023 version, the recovery features will be fixed, and there may also be an entirely new Recovery Project. Let's wait and see 🤪.

If you have the energy, I highly recommend giving all of them a try, or if there's something in the book that you don't quite understand, attempting the corresponding project can deepen your understanding (I personally suggest completing all of them, as I believe it will definitely be beneficial).

Personal Resources

The unofficial Discord is a great platform for discussion. The chat history practically documents the challenges that other students have encountered. You can also raise your own questions or help answer others', which I believe will be a great reference.

For a guidance to get through Spring 2023, you can refer to this article by @xzhseh on Zhihu (Note: Since the article is originally written in Chinese, you may need a translator to read it :) ). It covers all the tools you need to succeed, along with guides and, most importantly, pitfalls that I've encountered, seen, or stepped into during the process of doing the Project.

All the resources and assignments used by @ysj1173886760 in this course are maintained in ysj1173886760/Learning:db - GitHub.

Due to Andy's request, the repository does not contain the source code for the project, only the solution for homework. In particular, for Homework1, @ysj1173886760 wrote a shell script to help you evaluate your solution automatically.

After the course, it is recommended to read the paper Architecture Of a Database System. This paper provides an overview of the overall architecture of database systems so that you can have a more comprehensive view of the database.

Advanced courses

CMU15-721 is a graduate-level course on advanced database system topics. It mainly focuses on the in-memory database, and each class has a corresponding paper to read. It is suitable for those who wish to do research in the field of databases. @ysj1173886760 is currently following up on this course and will create a pull request here after completing it to provide advanced guidance.