Skip to content

Caltech CS 122: Database System Implementation

Course Introduction

  • University: California Institute of Technology (Caltech)
  • Prerequisites: None
  • Programming Language: Java
  • Course Difficulty: 🌟🌟🌟🌟🌟
  • Estimated Study Time: 150 hours

Caltech's course, unlike CMU15-445 which does not offer SQL layer functionality, focuses on the implementation at the SQL layer in its CS122 course labs. It covers various modules of a query optimizer, such as SQL parsing, translation, implementation of joins, statistics and cost estimation, subquery implementation, and the implementation of aggregations and group by operations. Additionally, there are experiments related to B+ trees and Write-Ahead Logging (WAL). This course is suitable for students who have completed the CMU15-445 course and are interested in query optimization.

Below is an overview of the first three assignments or lab experiments of this course:

Assignment 1

  • Provide support for delete and update statements in NanoDB.
  • Add appropriate pin/unpin code to the Buffer Pool Manager.
  • Improve the performance of insert statements without excessively inflating the size of the database file.

Assignment 2

  • Implement a simple plan generator to convert various parsed SQL statements into executable plans.
  • Implement join plan nodes that support inner and outer joins using the nested-loop join algorithm.
  • Add unit tests to ensure the correct implementation of inner and outer joins.

Assignment 3

  • Complete the collection of table statistics.
  • Perform plan cost calculation for various plan nodes.
  • Calculate the selectivity of various predicates that may appear in the execution plan.
  • Update the tuple statistics of the plan nodes' outputs based on predicates.

For the remaining Assignments and Challenges, please refer to the course description. It is recommended to use IDEA to open the project and Maven for building, keeping in mind the log-related configurations.

Course Resources