Project #0 - C++ Primer

Do not post your project on a public Github repository.

Overview

All the programming projects this semester will be written on the BusTub database management system. This system is written in C++. To make sure that you have the necessary C++ background, you must complete a simple programming assignment to assess your knowledge of basic C++ features. You will not be given a grade for this project, but you must complete the project with a perfect score before being allowed to proceed in the course. Any student unable to complete this assignment before the deadline will be asked to drop the course.

All of the code in this programming assignment must be written in C++. The projects will be specifically written for C++17, but we have found that it is generally sufficient to know C++11. If you have not used C++ before, here are some resources to help:

If you are using VSCode, we recommend you to install CMake Tools, C/C++ Extension Pack and clangd. After that, follow this tutorial to learn how to use the visual debugger in VSCode: Debug a C++ project in VS Code.

If you are using CLion, we recommend you to follow this tutorial: CLion Debugger Fundamentals.

If you prefer to use gdb for debugging, there are many tutorials available to teach you how to use gdb. Here are some that we have found useful:

This is a single-person project that will be completed individually (i.e. no groups).

Project Specification

An ordered set is a useful abstract data type in computer science. It maintains keys in an ordered sequence while allowing efficient search, insert, and delete operations. In early computer science courses, you likely learned to implement this data type using balanced search trees (such as red-black trees, AVL trees, or splay trees). These balanced tree implementations use various rebalancing algorithms to maintain their structural properties and guarantee good performance. In this project, we will implement a skip list—an alternative way to implement the ordered set that doesn't require rebalancing during insertions and deletions.

Implementing an ordered set with a plain linked list results in linear-time operations, as we must traverse each node to reach our target position. To improve performance, we can add an upper level of links that skip every other node, reducing the number of nodes we examine to ceil(n/2) + 1, where n is the number of elements in the list. Adding another level that traverses every fourth node further reduces this to ceil(n/4) + 1 nodes. A skip list takes inspiration from this observation by using multiple levels of links that skip over elements for faster traversal. Unlike balanced trees that use rebalancing algorithms, a skip list maintains its structural property and achieves logorithmic search, insertion and deletion time by using a random number generator during insertion to determine the number of levels each node will have.

Learning Goals

By the end of this project, you will:

Resources

To understand more about the skip list data structure, we recommend the following materials:

Instructions

For each of the following components, we provide stub classes that contain the API that you must implement. You should not modify the signatures for the pre-defined functions in these classes. If you modify the signatures, our grading test code will not work and you will get no credit for the project. Specifically for this project, you should not modify the data members. You may (and should) add helper functions to correctly implement the required functionality.

You will have to complete the following tasks as part of this project:

Task #1 - Skip List

In this task, you will need to modify skiplist.h and skiplist.cpp to implement a skip list that loosely follows a ordered set interface. We recommend reading the "Skip List Algorithms" section in the original skip list paper to get a better context.

Each node in our skip list stores the key and a list of forward links pointing to the next nodes at different levels. The height of a node is determined by its number of forward links. We define links_[0] as the lowest level link and links_[height-1] as the highest level link in a node. For example, node 9 in the diagram has a height of 2 and two forward links pointing to node 12 on level 0 and pointing to node 17 on level 1. For more details, please refer to the SkipNode class definition in the starter code.

skiplist-basic

The height of a skip list is determined by the node with the greatest height value and is capped at a pre-determined value MaxHeight. The skip list begins with a special header node that contains MaxHeight number of forward pointers. The links at levels greater or equal to the height of the skip list will be set to nullptr. An empty skip list has a height of 1 and only have the header node whose links all point to nullptr.

skiplist-empty

Your skip list must support search, insert, delete operations in O(log n) time complexity. Please refer to the starter code comments in skiplist.{h,cpp} for the full specification of operations. You are required to implement all the public interface of the SkipList class and the SkipList::SkipNode class. If the method is marked UNIMPLEMENTED(P0), you should add your implementation. We will walk through some of these operations in the following sections.

Search

To search for an element in the skip list, we start from the header node and follow the highest-level forward links, and we keep traversing in the current level as long as we don't overshoot the key we are looking for. When no more progress in the current level is possible, we move down to a lower level. When no more progress can be made at level 0, we will either be in front the node we are looking for or the key does not exist in the skip list.

skiplist-search-19

In our example to search for key 19, we start from the header node at level 3 and iteratively search down the list until we reach node 12 at level 0. Then we check the next node to find out that the node with key 19 is in the skip list.

Note: Your search should start at the height of your skip list and not at MaxHeight. Make sure you understand the difference between MaxHeight and the value returned by Height().

Insert and Delete

Insertion and deletion is where the skip list shines compared to balanced trees. The first step is to perform the same search we discussed in the previous section. In addition, we record an array of previous links of length MaxHeight as we traverse down each level so we are ready to update these links when we reach the desired location in the list to perform an insertion or a deletion.

skiplist-insert-17

In our example to insert key 17, we first perform the search and find that the position after node 12 is the location for insertion. While performing the search, we record the previous links to update (arrows in dashed lines). With a random number generator, we determined the new node 17 should have a height of 2. To complete our insertion, we update the level 0 and level 1 links to include node 17 in the linked sequence (arrows in bold).

Note: While there are many ways to implement a geometric random number generator in C++, many implementations are not portable. For testing purposes, please use the pre-implemented RandomHeight() method when creating a new node. To avoid failing tests, do not call RandomHeight() unnecessarily, as this would needlessly advance the rng_ state machine.

Note: If the new node has a height that is greater than the height of the previous skip list, you should backfill the upper levels of the previous links arrays to be the header node.

skiplist-erase-6

In our example to remove key 6, we first perform the search and find that node 6 is right after node 3. While performing the search, we record the previous links to update (arrows in dashed lines). To complete our removal, we update the links at level 0 to 3 to exclude node 6 from the linked sequence (arrows in bold).

Note: Remember to check if we have deleted the node with the maximum height in the list. If so, we should decrease the height of the skip list. In our example, the height of the skip list is decreased from 4 to 3 after the removal.

Note: We strongly recommend you to implement the helper methods defined the SkipNode class or come up with your ones. You should also consider code sharing and implementing helper functions for the SkipList class. See skiplist.h for a non-comprehensive list of suggestions.

Note: Remember to keep the skip list metadata (e.g. height and size of the skip list) up to date in your operations.

Task #2 - Skip List Synchronization

After you have a skip list which can be used in a single-thread environment, you will now implement a synchronization mechanism to make it safe to access concurrently in a mutlithreaded environment. In a multithreaded environment, the skip list can be accessed by a mix of readers and writers where readers only need read access whereas writers need both read and write access. The synchronization requirement for this task is to make sure

  1. A writer must have exclusive access to the object.
  2. Unlimited number of readers can have shared access to the object.

The synchronization lectures [ basic | advanced ] from the prerequsite course should help you review synchronization concepts and primitives.

Note: We strongly recommend you use RAII-style lock wrappers like std::shared_lock and std::unique_lock to protect your skip list instead of directly calling lock or unlock on your std::shared_mutex. Check out this example to see how to use them together in the field.

Note: Locking the entire skip list with a coarse-grained reader-writer lock allows at most one writer to access the list at a given time and limits the overall throughput. Can you think of ways to improve the performance? You are not asked to implement more advanced synchronization techniques in this project. We will cover more on this topic later in the class.

Creating Your Own Project Repository

If the below git concepts (e.g., repository, merge, pull, fork) do not make sense to you, please spend some time learning git first.

Follow the instructions to setup your own PRIVATE repository and your own development branch. If you have previuosly forked the repository through the GitHub UI (by clicking Fork), PLEASE DO NOT PUSH ANY CODE TO YOUR PUBLIC FORKED REPOSITORY! Make sure your repository is PRIVATE before you git push any of your code.

If the instructor makes any changes to the code, you can merge the changes to your code by keeping your private repository connected to the CMU-DB master repository. Execute the following commands to add a remote source:

$ git remote add public https://github.com/cmu-db/bustub.git

You can then pull down the latest changes as needed during the semester:

$ git fetch public
$ git merge public/master

Setting Up Your Development Environment

First install the packages that BusTub requires:

# Linux
$ sudo build_support/packages.sh
# macOS
$ build_support/packages.sh

See the README for additional information on how to setup different OS environments.

To build the system from the commandline, execute the following commands:

$ mkdir build
$ cd build
$ cmake -DCMAKE_BUILD_TYPE=Debug ..
$ make -j`nproc`

We recommend always configuring CMake in debug mode. This will enable you to output debug messages and check for memory leaks (more on this in below sections).

Testing

You can test the individual components of this assignment using our testing framework. We use GTest for unit test cases. You can disable tests in GTest by adding a DISABLED_ prefix to the test name. To run the tests from the command-line:

$ cd build
$ make -j$(nproc) skiplist_test
$ ./test/skiplist_test

In this project, there are no hidden tests. In the future, the provided tests in the starter code are only a subset of the all the tests that we will use to evaluate and grade your project. You should write additional test cases on your own to check the complete functionality of your implementation.

Make sure that you remove the DISABLED_ prefix from the test names otherwise they will not run!

Formatting

Your code must follow the Google C++ Style Guide. We use Clang to automatically check the quality of your source code. Your project grade will be zero if your submission fails any of these checks.

Execute the following commands to check your syntax. The format target will automatically correct your code. The check-lint and check-clang-tidy targets will print errors that you must manually fix to conform to our style guide.

$ make format
$ make check-clang-tidy-p0

Memory Leaks

For this project, we use LLVM Address Sanitizer (ASAN) and Leak Sanitizer (LSAN) to check for memory errors. To enable ASAN and LSAN, configure CMake in debug mode and run tests as you normally would. If there is memory error, you will see a memory error report. Note that macOS only supports address sanitizer without leak sanitizer.

In some cases, address sanitizer might affect the usability of the debugger. In this case, you might need to disable all sanitizers by configuring the CMake project with:

$ cmake -DCMAKE_BUILD_TYPE=Debug -DBUSTUB_SANITIZER= ..

Development Hints

You can use BUSTUB_ASSERT for assertions in debug mode. Note that the statements within BUSTUB_ASSERT will NOT be executed in release mode. If you have something to assert in all cases, use BUSTUB_ENSURE instead.

We will test your implementation in release mode. To compile your solution in release mode,

$ mkdir build_rel && cd build_rel
$ cmake -DCMAKE_BUILD_TYPE=Release ..

Post all of your questions about this project on Piazza. Do not email the TAs directly with questions.

TAs will not look into your code or help you debug in this project.

Grading Rubric

In order to pass this project, you must ensure your code follows the following guidelines:

  1. Does the submission successfully execute all of the test cases and produce the correct answer?
  2. Does the submission execute without any memory leaks?
  3. Does the submission follow the code formatting and style policies?

Note that we will use additional test cases to grade your submission that are more complex than the sample test cases that we provide you in future projects.

Late Policy

There are no late days for this project.

Submission

You will submit your implementation to Gradescope:

Run this command in build directory and it will create a zip archive called project0-submission.zip that you can submit to Gradescope.

$ make submit-p0

Although you are allowed submit your answers as many times as you like, you should not treat Gradescope as your only debugging tool. Most students submit their projects near the deadline, and thus Gradescope will take longer to process the requests. You may not get feedback in a timely manner to help you debug problems. Furthermore, the output from Gradescope is unlikely to be as informative as the output from a debugger (like gdb), provided you invest some time in learning to use it.

CMU students should use the Gradescope course code announced on Piazza.

Collaboration Policy

WARNING: All of the code for this project must be your own. You may not copy source code from other students or other sources that you find on the web. Plagiarism will not be tolerated. See CMU's Policy on Academic Integrity for additional information.