Project #4 - Logging & Recovery

Do not post your project on a public Github repository.

Overview

The fourth programming project is to implement Logging, Recovery, and Checkpoints in your database system. The first task is to implement write ahead logging (WAL) under No-Force/Steal buffering policy and log every single page-level write operation and transaction command. The second task is to implement a recovery mechanism, which will allow the the database to reconstruct itself from the log. The final task is to implement a simple, blocking checkpointing mechanism that generates fully consistent checkpoints.

The project is comprised of the following two tasks:

This is a single-person project that will be completed individually (i.e., no groups).

Please post all of your questions about this project on Piazza. Do not email the TAs directly with questions. The instructor and TAs will not debug your code. You may verbally describe test cases on Piazza, but do not post an excessive amount of testing code on Piazza either.

  • Release Date: Nov 15, 2019
  • Due Date: Dec 10, 2019 @ 11:59pm

Project Specification

Like the previous project, we are providing you with stub classes that contain the API that you need to implement. You should not modify the signatures for the pre-defined functions in these classes. If you do this, then it will break the test code that we will use to grade your assignment you end up getting no credit for the project. If a class already contains certain member variables, you should not remove them. You may however add private helper functions and member variables to these classes. You can also add new public helper functions, e.g. in the Log Manager.

The correctness of this project depends on the correctness of your implementation of previous projects, we will not give solutions or binary files. However, this project may differ from the last ones in several aspects. We have already made the necessary modifications to the insert/delete/search operations in the TableHeap (include/storage/table/table_heap.h), as well as interactions between the log manager, buffer pool manager, transaction manager, and checkpoint manager.

Task #1 - Log Manager

To achieve the goal of atomicity and durability, the database system writes information describing the modifications made by any transaction to stable storage. This information ensures that all modifications performed by committed transactions are reflected in the database (perhaps during the course of recovery actions after a crash). It can also help the DBMS ensure that no modifications made by an aborted or failed transaction persist in the database after a crash/restart. The most widely used structure for recording database modifications is the write-ahead log. The log is a sequence of log records, recording all the update activities in the database. We are providing you with the basic structure of a log record (include/recovery/log_record.h) and corresponding helper methods.

In your implementation, there is a global LogManager object for the entire system (similar to your BufferPoolManager). The TablePage class will explicitly create a log record (before any update operations) and invoke the AppendLogRecord method of LogManager to write it into log buffer when the global variable enable_logging (include/common/config.h) flag is set to be true. We recommend that you to read this article to refresh your C++ concurrency knowledge. More detailed documentation about condition variable is available here.

Implementation Details for Logging/Recovery

This list points out some implementation details that have been set up in advance to enable your logging/recovery system.

  • Transaction Manager: There is a ReaderWriterLatch instance in the TransactionManager called global_txn_latch_. This latch is used in the two helper functions for stopping the execution of transactions (BlockAllTransactions, ResumeTransactions).
  • Global Variables: There is a global variable called enable_logging within include/common/config.h. Logging functionality should only be active when this is set to true. You may need to explicitly set it to false in order to pass some of the previous test cases. The constant log_timeout is defined within config.h, for every log_timeout seconds, your LogManager implementation needs to execute a flush operation (more details in the next section).
  • Buffer Pool Manager: BufferPoolManager's constructor accepts a DiskManager and LogManager as input parameters. The default value for log_manager is nullptr, and log_manager is only valid when enable_logging is set to true. We have also added two simple getter functions: Page* GetPages() and size_t GetPoolSize(). These are used to test your CheckpointManager.
  • Disk Manager: The DiskManager creates the database file as well as the log file for its constructor. We have also provided separate helper functions WriteLog and ReadLog to support accessing the log file.
  • HashTableHeaderPage: There is a member variable lsn_ to record the page's log sequence number. You do not need to update lsn_ within your index implementation, we only test logging and recovery functionalities on table page level.
  • Page: The Page object has GetLSN and SetLSN helper methods. Remember that the log sequence number is a 4-byte int32_t stored in the page's data_ array.
  • TablePage: We extended TablePage to make appropriate calls to the LogManager on insert, update, and delete operations. Please make sure to review these changes and make sure you understand what they're doing.
  • Transaction: The Transaction object has the GetPrevLSN and SetPrevLSN helper methods. Each transaction is responsible for maintaining the previous log sequence number to be used in the undo phase. Please consult Chapter 16.8 in the textbook for more detailed information.
  • LogRecord: We have provided the implementation of physiological log record that supports different type of write operations within database system. Each log type corresponds to a write operation within the TablePage class (storage/page/table_page.h, so please make sure you understand the record structure.

REQUIREMENTS AND HINTS

The files you need to modify for this task are:

  • LogManager: recovery/log_manager.cpp, include/recovery/log_manager.h)
  • LogRecovery: recovery/log_recovery.cpp, include/recovery/log_recovery.h
  • TransactionManaer: concurrency/transaction_manager.cpp, include/concurrency/transaction_manager.h
  • CheckpointManager: recovery/checkpoint_manager.cpp, include/recovery/checkpoint_manager.h
  • BufferPoolManager: buffer/buffer_pool_manager.cpp, include/buffer/buffer_pool_manager.h

You will need to implement the following functionality:

  • Inside the RunFlushThread function in the LogManager, you need to start a separate background thread which is responsible for flushing the logs to disk. The thread runs forever until system shutdown/StopFlushThread. The thread is triggered every log_timeout seconds or when the log buffer is full. Since your LogManager needs to perform asynchronous I/O operations, you will maintain two log buffers, one for flushing (called the flush_buffer) one for concurrently appending log records (called the log_buffer). You should swap buffers under any of the following three situations. (1) When the log buffer is full, (2) when log_timeout seconds have passed, and (3) When the buffer pool is going to evict a dirty page from the LRU replacer.
  • The StopFlushThread function is used to stop logging and join the flush thread
  • Your LogManager will integrate the group commit feature. The motivation behind group commit is to amortize the costs of each fsync() over multiple commits from multiple parallel transactions. If there are, say, 10 transactions in parallel trying to commit, we can force all of them to disk at once with a single fsync() call, rather than do one fsync() for each. This can greatly reduce the need for fsync() calls, and consequently greatly improves the commits-per-second throughput. Within TransactionManager, whenever you call the Commit or Abort method, you need to make sure your log records are permanently stored on disk file before releasing the locks. But instead of forcing a flush, you need to wait for log_timeout or other operations to implicitly trigger the flush operations.
  • In your BufferPoolManager, when a new page is created, if there is already an entry in the page_table_ mapping for the given page id, you should make sure you explicitly overwrite it with the frame id of the new page that was just created. This is a minor behavioral change in the buffer pool manager, but it is a requirement to pass the test cases.
  • Before your buffer pool manager evicts a dirty page from the clock replacer and writes this page back to the disk, it needs to flush all logs up to pageLSN. You need to compare persistent_lsn_ (a member variable in LogManager) with your pageLSN. However, unlike with group commit, the buffer pool can force the log manager to flush the log buffer (but still needs to wait for the logs to be stored before continuing).
  • The TablePage class methods have some portions that deal with runtime write-ahead logging. They (1) create a log record, (2) invoke the LogManager's AppendLogRecord method to write it to the log buffer when the global variable enable_logging (include/common/config.h) is true, (3) update the prevLSN for the current transaction, and (4) update the LSN for the current page. Your job is to implement AppendLogRecord in the LogManager class so that the log records are properly serialized to the log buffer.

Important: You should first take a look at the files we mention in this and previous sections to become familiar with the APIs and member variables we provide. You should consult with Chapter 16.8 in the textbook and make sure that your implementation follows the ARIES protocol (except for the checkpointing part). Since the LogManager needs to deal with background threads and thread synchronization, we recommend you to take a look at Future and Promise.

Testing

There is a sample test in test/recovery/recovery_test.cpp.

Task #2 - System Recovery

The next part of the project is to implement the ability for the DBMS to recover its state from the log file.

REQUIREMENTS AND HINTS

The recovery process for our database system is pretty straightforward. Since we do not support fuzzy checkpoints, there is no need for an analysis pass. The only file you need to modify for this task is the LogRecovery class (recovery/log_recovery.cpp and include/recovery/log_recovery.h). You will need to implement the following functions:

  • DeserializeLogRecord: Deserialize and reconstruct a log record from the log buffer. If the return value is true then the deserialization step was successful, otherwise the log buffer may contain incomplete log record.
  • Redo: Redo pass on the TABLE PAGE level (include/storage/page/table_page.h). Read the log file from the beginning to end (you must prefetch log records into the buffer to reduce unnecessary I/O operations), and for each log record, redo the action unless the page is already more up-to-date than this record. Also make sure to maintain the active_txn_ table and lsn_mapping_ table along the way.
  • Undo: Undo pass on the TABLE PAGE level (include/table/table_page.h). Iterate through the active_txn_ table and undo operations within each transaction. You do not need to worry about a crash during the recovery process, meaning no complementary logging is needed.

Important: We recommend that you read Chapter 16.8 in the textbook first to make sure that you understand the whole process of recovery and what the redo/undo pass is trying to do. The basic idea is that during recovery, redo does the exact operation (MARKDELETE -> MarkDelete) whereas undo does the complementary operation (MARKDELETE -> RollBackDelete). Then, figure out each write operation within the TablePage class and what the corresponding complementary operation is. For example, to undo an Insert operation, you need to call ApplyDelete function instead of MarkDelete. The complementary operations are: Insert <-> ApplyDelete, MarkDelete <-> RollBackDelete, Update <-> Update, NewPage <-> (do nothing)

Testing

There is a sample test in test/recovery/recovery_test.cpp.

Task #3 - Checkpoints

For the final task, you will implement a simple mechanism to make fully consistent checkpoints. We've added a new module called CheckpointManager that is responsible for handling this. It should work directly with the TransactionManager, BufferPoolManager, and LogManager to block any new transactions, flush the WAL to disk, and flush all dirty buffer pool pages to disk. Finally, it should allow new transactions to begin.

REQUIREMENTS AND HINTS

The API for the CheckpointManager has two functions:

  • BeginCheckpoint: Use the TransactionManager to block any new transactions from starting, then use the LogManager to ensure the contents of the WAL in memory is flushed to disk. Lastly, use the BufferPoolManager to flush all dirty pages to disk. Note that the DBMS must keep all transactions blocked when this method returns (this is necessary for grading).
  • EndCheckpoint: Use the TransactionManager to unblock transactions and allow new transactions to begin.

Adding checkpointing to your DBMS should be the simplest of the three tasks as it does not require a lot of code. You just need to extend your BufferPoolManager to flush all dirty pages to disk and mark them clean. With this in place, your BeginCheckpoint and EndCheckpoint methods should be short.

Testing

There is a sample test in test/recovery/recovery_test.cpp.

Grading Rubric

Each project submission will be graded based on the following criteria:

  1. Does the submission successfully execute all of the test cases and produce the correct answer?
  2. Does the submission execute without any memory leaks?
  3. Does the submission follow the code formatting and style policies?

Note that we will use additional test cases that are more complex and go beyond the sample test cases that we provide you.

Late Policy

See the late policy in the syllabus.

Submission

After completing the assignment, you can submit your implementation to Gradescope.

Minimally, we expect the following files to be submitted:

  • src/recovery/log_manager.cpp
  • src/include/recovery/log_manager.h
  • src/recovery/log_recovery.cpp
  • src/include/recovery/log_recovery.h
  • src/recovery/checkpoint_manager.cpp
  • src/include/recovery/checkpoint_manager.h
  • src/concurrency/transaction_manager.cpp
  • src/include/concurrency/transaction_manager.h

...in addition to files from Project #1:

  • src/include/buffer/clock_replacer.h
  • src/buffer/clock_replacer.cpp
  • src/include/buffer/buffer_pool_manager.h
  • src/buffer/buffer_pool_manager.cpp

You may add additional files as long as they are also under the src/ directory.

You can submit your answers as many times as you like and get immediate feedback.

Collaboration Policy

  • Every student has to work individually on this assignment.
  • Students are allowed to discuss high-level details about the project with others.
  • Students are not allowed to copy the contents of a white-board after a group meeting with other students.
  • Students are not allowed to copy the solutions from another colleague.

WARNING: All of the code for this project must be your own. You may not copy source code from other students or other sources that you find on the web. Plagiarism will not be tolerated. See CMU's Policy on Academic Integrity for additional information.