Project #1 - Buffer Pool Manager

Do not post your project on a public Github repository.

Overview

This semester, you will build a disk-oriented database management system (DBMS) called BusTub. A disk-oriented architecture means that the DBMS's primary storage location is in persistent storage, like a hard drive (HDD) or flash storage (SSDs). This is different from an in-memory DBMS, where data is stored in volatile memory.

The first programming project is to implement the DBMS's buffer pool manager. The buffer pool is responsible for moving physical pages of data back and forth from buffers in main memory to persistent storage. It also behaves as a cache, keeping frequently used pages in memory for faster access, and evicting unused or cold pages back out to storage.

A page in BusTub is 8192 bytes (8 KB) of data, meaning the buffer pool manages data in 8 KB units. Since pages in BusTub are fixed size, the buffer pool manager stores these pages into fixed-size buffers called frames. The distinction between a page and a frame is somewhat subtle. A page is 8 KB of logical (virtual) data, and can be stored in memory, on disk, or both in memory and on disk. A frame, on the other hand, is a fixed-length 8 KB block of memory (i.e., a pointer to this memory) that stores a single page of data. The analogy here is storing (logical) pages inside (physical) fixed frames.

In addition to behaving as a cache, the buffer pool manager allows a DBMS to support databases that are larger than the amount of memory available to the system. Consider a computer with 1 GB of memory (RAM). If we want to manage a 2 GB database, a buffer pool manager gives us the ability to interact with this database without needing to fit its entire contents in memory.

The I/O operations that the buffer pool executes are abstracted away from other parts of the DBMS. For example, when one of the DBMS's components (e.g., execution engine) asks the buffer pool manager for a page of data using its unique identifier (page_id_t), that component does not need to know whether that page is already in memory or whether the system has to retrieve it from disk. Similarly, the buffer pool manager does not need to understand the contents of these pages, it only needs to know where the data is located.

Implementation

Your implementation of the buffer pool must be thread-safe. Multiple threads will concurrently access the internal data structures of your buffer pool, and you must make sure that critical sections are protected with latches (these are called "locks" in operating systems).

You must implement the following storage manager components:

This is a single-person project that must be completed individually (i.e., no groups).

Project Specification

Remember to pull latest code from the BusTub repository.

For each of the following components, we have provided stub classes that contain the API that you must implement. You should not modify the signatures for the pre-defined functions in these classes. If you modify the signatures, our grading test code will not work and you will not get credit for this project.

If a class already contains data members, you should not remove them. For example, the BufferPoolManager class contains DiskScheduler and ArcReplacer members that are required to implement functionality needed by the rest of the system. You may add data members and helper functions to these classes to correctly implement the required functionality.

You may use any built-in C++17 containers in your project unless specified otherwise. It is up to you to decide which ones you want to use. Be warned that these containers are not thread-safe, and you will need to use latches to protect access to them. You may not use additional third-party libraries (e.g., Boost).

Task #1 - Adaptive Replacement Cache (ARC) Replacement Policy

This component is responsible for tracking page usage in the buffer pool in order to determine candidate pages / frames to evict out of memory and back to disk.

You will implement a class called ArcReplacer in src/include/buffer/arc_replacer.h and its corresponding implementation file in src/buffer/arc_replacer.cpp . Note that ArcReplacer is a standalone class and is not related to any of the other Replacer classes. You are only expected to implement the Arc replacement policy, and you don't have to implement the LRU-K, LRU or Clock replacement policies (even though there are corresponding files for them).

The ARC replacement policy, originally developed at IBM, is an adaptive replacement policy that changes to the workload it observes. It involves two lists that tracks the cached pages, two lists that tracks the recently evicted pages, and a target size that is adaptive to the workload. Because of this adaptiveness, the ARC replacement policy generally performs better than LRU. Refer to the original paper for more details.

You will be implementing a variant of the ARC replacement policy for this project.

You will need to implement the following methods for ARC as defined in the header file (src/include/buffer/arc_replacer.h ) and in the source file (src/buffer/arc_replacer.cpp ):

ARC Replacement Algorithm

The ARC algorithm has the following parts. We start with two lists: the MRU (most recently used) list tracks the frames and their corresponding pages that were recently accessed exactly once, while the MFU (most frequently used) list tracks the frames and their corresponding pages that were recently accessed more than one time. We also start with two ghost lists: an MRU ghost list and an MFU ghost list. These lists tracks pages that are no longer in the buffer pool, but were recently evicted. Lastly, we also have a target size for the MRU list that adapts to the change of the workload, which starts at 0. Note that the actual MRU list size could be different than the target, it may be smaller or larger, this is just our target size.

When working with the ARC replacer, there are generally five concepts here involving sizes, which is to be distinguished from each other:

Also, please make sure you understand the relationship between frames and pages here, so it might make sense to you why tracking page ids along with frame ids is needed:

When performing RecordAccess over a frame and its corresponding page, there are four cases where exactly one of them will happen:

  1. Page already exists in MRU/MFU: This is the case where the actual cache hits. Move the page to the front of MFU.
  2. Page already exists in MRU ghost: This is the case where the actual cache misses but we hit on the ghost list. In this case we treat it as a pseudo-hit and adapt the target size. If the size of the MRU ghost list is greater than or equal to the size of the MFU ghost list, increase the MRU target size by one. Else increase it by MFU ghost size / MRU ghost size (rounded down). Do not increase the target size above replacer_size. Then move the page to the front of MFU. The rational of this is if the MRU list is a little larger, then the DBMS could have had a cache hit.
  3. Page already exists in MFU ghost: Similar to the previous case, this is when the actual cache misses but we hit on the ghost list. If the size of the MFU ghost list is greater than or equal to the size of the MRU ghost list, decrease the MRU target size by 1. Else decrease the MRU target size by MRU ghost size / MFU ghost size (rounded down). Do not decrease the target size below 0. Then move the page to the front of MFU. The rational of this is if the MFU list is a little larger, the DBMS could have had a cache hit.
  4. Page is not in the replacer: This is the case where the actual cache misses and the ghost list misses. Then either of the following should happen.
    1. If MRU size + MRU ghost size = replacer size: Kill the last element in the MRU ghost list, then add the page to the front of MRU.
    2. Else MRU size + MRU ghost size should be smaller than replacer size (it should never be larger if you do things correctly). In this case
      • If MRU size + MRU ghost size + MFU size + MFU ghost size = 2 * replacer size: Kill the last element in the MFU ghost list, then add the page to the front of MRU.
      • Else simply add the page to the front of the MRU.

Try considering why in case 4(a) and 4(b), there must be items in the ghost lists.

Implementation

When you implement this algorithm, it is important to understand when should a page go to MRU, and when should it go to MFU. It also helps to think about why the given action is taken for each of the cases and what it's tring to do, rather than transpiling English into C++ code. If the MRU list size is smaller than the target size, we try to evict from the MFU list. If the MRU list size is greater than or equal to the target size, we try to evict from the MRU list. In either case, if eviction is not possible from the intended side (nothing is evictable in that list), try evicting from the other list. If still nothing is evictable, the eviction fails and return std::nullopt.

The implementation details are up to you. You are allowed to use built-in STL containers. You may assume that you will not run out of memory for these data structures (you cannot assume the same for the buffer pool in Task #3, you will run out of available frames). You must make sure that your implementation is thread-safe.

You might notice there is a test that tests for the performance of your RecordAccess implementation. If your implementation fails / times out on the test, try think of what makes RecordAccess slow and how you could fix it. As a reminder, you will modify the data structures and member variables we provided you in the header file, but you can also add additional data structures to speed up operations.

If you would like to read more about the ARC replacement algorithm, refer to this paper. Thid project does not require you to implement the original algorithm exactly. You are also welcome to think about what we required you to do that is in addition to what the original algorithm could achieve.

Task #2 - Disk Scheduler

This component is responsible for scheduling read and write operations on the DiskManager. You will implement a class called DiskScheduler in src/include/storage/disk/disk_scheduler.h and its corresponding implementation file in src/storage/disk/disk_scheduler.cpp .

The disk scheduler can be used by other components (in this case, your BufferPoolManager in Task #3) to queue disk requests, represented by a DiskRequest struct (already defined in src/include/storage/disk/disk_scheduler.h ). The disk scheduler will maintain a background worker thread which is responsible for processing scheduled requests.

The disk scheduler will utilize a shared queue (channel) to schedule and process the DiskRequests. One thread will add a request to the queue, and the disk scheduler's background worker will process the queued requests. We have provided a Channel class in src/include/common/channel.h to facilitate the thread-safe sharing of data between threads, but feel free to use your own implementation if you find it necessary.

The DiskScheduler constructor and destructor are already implemented and are responsible for creating and joining the background worker thread. You will only need to implement the following methods as defined in the header file (src/include/storage/disk/disk_scheduler.h ) and in the source file (src/storage/disk/disk_scheduler.cpp ):

We mentioned that one of the fields of a DiskRequest is a std::promise. If you are unfamiliar with C++ promises and futures, you can check out the documentation here. For the purposes of this project, they essentially provide a callback mechanism for a thread to know when their scheduled request is completed. To see an example of how they might be used, check out disk_scheduler_test.cpp.

Again, the implementation details are up to you. You must make sure that your implementation is thread-safe.

Disk Manager

The header containing the DiskManager class is located at (src/include/storage/disk/disk_manager.h ). It reads page data from disk and writes data to disk. Your disk scheduler will use DiskManager::ReadPage() and DiskManager::WritePage() while it is processing a read or write request.

Task #3 - Buffer Pool Manager

Finally, you must implement the buffer pool manager (BufferPoolManager)! Echoing the beginning of this page, the BufferPoolManager is responsible for fetching database pages from disk with the DiskScheduler and storing them in memory. The BufferPoolManager can also schedule writes of dirty pages out to disk when it is either explicitly instructed to do so or when it needs to evict a page to make space for a new page.

Your BufferPoolManager implementation will use the ArcReplacer and DiskScheduler classes that you created in the previous steps of this assignment. The ArcReplacer will keep track of when pages are accessed so that it can decide which frame to evict when it must make room for a new page. The DiskScheduler will schedule writes and reads to disk on the DiskManager.

We have provided a helper class called FrameHeader, which helps manage the in-memory frames. All access to page data should be through FrameHeaders. FrameHeader has a method called GetData that returns a raw pointer to its frame's memory, and the DiskScheduler / DiskManager will use this pointer to copy the contents of a physical page on disk into memory.

As a reminder, the buffer pool manager does not need to understand the contents of these pages. The only information that the BufferPoolManager knows about pages are the page IDs (page_id_t) and the FrameHeaders they are stored inside of. Also, the BufferPoolManager will reuse the same FrameHeader object to store data as it moves back and forth between disk and memory. In other words, all FrameHeaders will store many different pages throughout the lifetime of the system.

Concurrency

When implementing a multi-threaded buffer pool manager, we must take care to synchronize data access. This means that we do not want multiple copies of the same page in different frames of the buffer pool. If we allowed this, we would encounter this scenario:

Thus, we keep only 1 version of a page in memory at a time to prevent data synchronization races. Additionally, to prevent us from evicting a page while threads are accessing it, we maintain a reference count / pin count on the frame that stores it. Finally, in order to keep track of which pages are stored in which frames, we also maintain a page table using a hash map that maps page IDs to frames.

The pin count of a frame is the number of threads that have access to the page's data. As long as the pin count on a frame is greater than 0 (implying there is at least 1 thread accessing the page's data), the buffer pool manager is not allowed to evict the page being stored. You can maintain the pin count using the atomic field pin_count_ in the FrameHeader class. Keep in mind that pin_count_ is separate from ArcReplacer::SetEvictable, so you will need to make sure those are synced properly. You will also have to update the is_dirty_ flag of the FrameHeader when you think it is necessary. If this flag is set when you want to evict a page, you will have to act accordingly to maintain data synchronization between memory and disk.

Lastly, you will have to implement both ReadPageGuard and WritePageGuard. These classes are RAII objects that provide thread-safe read / write access to the underlying pages. See the implementation section below for more information. You will probably need to implement this in tandem with the BufferPoolManager methods CheckedReadPage and CheckedWritePage. However, if you want to make sure your page guard implementations are correct, you may choose to implement BufferPoolManager::GetPinCount first and then stitch together something that will pass the page guard tests.

Implementation

You will need to implement the following page guard methods defined in the header file (src/include/storage/page/page_guard.h ) and in the source file (src/storage/page/page_guard.cpp ):

You do not have to implement these methods before the BufferPoolManager methods. You should probably work on them at the same time.

These methods implement move semantics and RAII for the page guards. If you are unfamiliar with these things, please familiarize yourself with learning materials online. There are many great resources (including articles, Microsoft tutorials, YouTube videos) that explain this in depth. You should not attempt to implement these methods without having a solid understanding of how RAII and move semantics work.

There will likely be a lot of code duplication here (i.e. the two guards should be identical except for a handful of lines). If you want to derive these classes based on a class you create, you are welcome to do so. Just make sure that no interfaces and method signatures are changed!

You will also need to implement the following BufferPoolManager methods defined in the header file (src/include/buffer/buffer_pool_manager.h ) and in the source file (src/buffer/buffer_pool_manager.cpp ):

All of these methods have detailed documentation comments in the source file. Make sure to read all of these in their entirety because they contain many useful hints!

You do not need to make your buffer pool manager super efficient. For all of the public BufferPoolManager method, holding the buffer pool latch from beginning to end should be enough (except for when you need to release it early to prevent deadlocks). However, you do need to ensure that your buffer pool manager has reasonable performance, otherwise there will be problems in future projects. You can compare your benchmark result (QPS.1 and QPS.2) with other students and see if your implementation is too slow.

Please refer to the source files (src/storage/page/page_guard.cpp and src/buffer/buffer_pool_manager.cpp ) for significantly more detailed specifications and documentation.

Leaderboard Task (Optional)

For this project's leaderboard challenge, we are doing a benchmark on your buffer pool manager with a special storage backend.

Optimizing for the leaderboard is optional (i.e., you can get a perfect score in the project after finishing all previous tasks). However, your solution must finish the leaderboard test with a correct result and without deadlock and segfault.

The leaderboard test is compiled with the release profile:

mkdir cmake-build-relwithdebinfo
cd cmake-build-relwithdebinfo
cmake .. -DCMAKE_BUILD_TYPE=RelWithDebInfo
make -j `nproc` bpm-bench
# The command below is just for illustrating how the bpm-bench test works.
# We are NOT running tests with the same parameter for the leaderboard test.
./bin/bustub-bpm-bench --duration 5000 --latency 1

We strongly recommend you to checkpoint your code before optimizing for leaderboard tests, so that if these optimizations cause problems in future projects, you can always revert them.

In the leaderboard test, we will have multiple threads accessing the pages on the disk. There are two types of threads running in the benchmark:

  1. Scan threads. Each scan thread will read pages on the disk sequentially. There will be 8 scan threads.
  2. Get threads. Each get thread will randomly select and update a page using the zipfian distribution. There will be 8 get threads.

We will run the benchmark three times on the in-memory storage backend, each time for 30 seconds. For the first and the second time, it will run directly with different buffer pool and replacer settings. For the third time, we will add a 1-millisecond latency to each of the random read/write operation, and 0.1ms latency to sequential/local read/write operations. See DiskManagerUnlimitedMemory class for more information.

The final score is computed as a weighted QPS of scan and get operations, with and without latency respectively:

scan_qps_large / 1000 + get_qps_large / 1000 + scan_qps_small / 1000 + get_qps_small / 1000 + scan_qps_1ms + get_qps_1ms

Recommended Optimizations

  1. Better replacer algorithm. Given that get workload is skewed (i.e., some pages are more frequently accessed than others), you can design your ARC replacer to take page access type into consideration, so as to reduce page miss. Consider how you can queue multiple requests and pre-fetch data.
  2. Parallel I/O operations. Instead of processing one request at a time in your disk scheduler, you can issue multiple requests to the disk manager at the same time. This optimization will be very useful in modern storage devices, where concurrent access to the disk can make better use of the disk bandwidth. You should handle the case that multiple operations to the same page are in the queue and the end result of these requests should be as if they are processed in order. In a single thread, they should have read-after-write consistency.
  3. To achieve true parallelism in disk scheduler, you will also need to allow your buffer pool manager can handle multiple ReadPage and WritePage requests and evicting multiple pages at the same time. You might need to bring in a conditional variable in your buffer pool manager to manage free pages.

Leaderboard Policy

Instructions

See the Project #0 instructions on how to create your private repository and setup your development environment.

Testing

You can test the individual components of this assigment using our testing framework. We use GTest for unit test cases. There are three separate files that contain tests for each component:

You can compile and run each test individually from the command-line:

$ make arc_replacer_test -j `nproc`
$ ./test/arc_replacer_test

You can also run make check-tests to run ALL of the test cases. Note that some tests are disabled as you have not implemented future projects. You can disable tests in GTest by adding a DISABLED_ prefix to the test name.

Important: These tests are only a subset of the all the tests that we will use to evaluate and grade your project. You should write additional test cases on your own to check the complete functionality of your implementation.

Formatting

Your code must follow the Google C++ Style Guide. We use Clang to automatically check the quality of your source code. Your project grade will be zero if your submission fails any of these checks.

Execute the following commands to check your syntax. The format target will automatically correct your code. The check-lint and check-clang-tidy-p1 targets will print errors and instruct you how to fix it to conform to our style guide.

$ make format
$ make check-clang-tidy-p1

Memory Leaks

For this project, we use LLVM Address Sanitizer (ASAN) and Leak Sanitizer (LSAN) to check for memory errors. To enable ASAN and LSAN, configure CMake in debug mode and run tests as you normally would. If there is memory error, you will see a memory error report. Note that macOS only supports address sanitizer without leak sanitizer.

In some cases, address sanitizer might affect the usability of the debugger. In this case, you might need to disable all sanitizers by configuring the CMake project with:

$ cmake -DCMAKE_BUILD_TYPE=Debug -DBUSTUB_SANITIZER= ..

Development Hints

You can use BUSTUB_ASSERT for assertions in debug mode. Note that the statements within BUSTUB_ASSERT will NOT be executed in release mode. If you have something to assert in all cases, use BUSTUB_ENSURE instead.

Post all of your questions about this project on Piazza. Do not email the TAs directly with questions.

We encourage you to use a graphical debugger to debug your project if you are having problems.

If you are having compilation problems, running make clean does not completely reset the compilation process. You will need to delete your build directory and run cmake .. again before you rerun make.

Grading Rubric

Each project submission will be graded based on the following criteria:

  1. Does the submission successfully execute all of the test cases and produce the correct answer?
  2. Does the submission execute without any memory leaks?
  3. Does the submission follow the code formatting and style policies?

Note that we will use additional test cases to grade your submission that are more complex than the sample test cases that we provide you.

Late Policy

See the late policy in the syllabus.

Submission

After completing the assignment, you can submit your implementation to Gradescope:

Running make submit-p1 in your build/ directory will generate a zip archive called project1-submission.zip under your project root directory that you can submit to Gradescope.

You can submit your answers as many times as you like and get immediate feedback.

Notes on Gradescope and Autograder

  1. If you are timing out on Gradescope, it's likely because you have a deadlock in your code or your code is too slow and does not run in 60 seconds. If your code is too slow it may be because your ArcReplacer is not efficient enough.
  2. The autograder will not work if you are printing too many logs in your submissions.
  3. If the autograder did not work properly, make sure that your formatting commands work and that you are submitting the right files.
  4. The leaderboard benchmark score will be calculated by stress testing your buffer_pool_manager implementation.

CMU students should use the Gradescope course code announced on Piazza.

Collaboration Policy

WARNING: All of the code for this project must be your own. You may not copy source code from other students or other sources that you find on the web. Plagiarism will not be tolerated. See CMU's Policy on Academic Integrity for additional information.