Project #4 - Logging & Recovery
Do not post your project on a public Github repository.
Overview
The fourth programming project is to implement Logging, Recovery, and Checkpoints in your database system. The first task is to implement write ahead logging (WAL) under No-Force/Steal buffering policy and log every single page-level write operation and transaction command. The second task is to implement a recovery mechanism, which will allow the the database to reconstruct itself from the log. The final task is to implement a simple, blocking checkpointing mechanism which will generate fully consistent checkpoints.
The project is comprised of the following two tasks:
This is a single-person project that will be completed individually (i.e., no groups).
Please post all of your questions about this project on Piazza. Do not email the TAs directly with questions. The instructor and TAs will not debug your code.
- Release Date: Nov 22, 2018
- Due Date: Dec 10, 2018 @ 11:59pm
Project Specification
Like the previous project, we are providing you with stub classes that contain the API that you need to implement. You should not modify the signatures for the pre-defined functions in these classes. If you do this, then it will break the test code that we will use to grade your assignment you end up getting no credit for the project. If a class already contains certain member variables, you should not remove them. But you may add private helper functions/member variables to these classes in order to correctly realize the functionality.
The correctness of this project depends on the correctness of your implementation of previous projects, we will not give solutions or binary files. However, this project may differ from the last ones in several aspects. We have already made the necessary modifications to the insert/delete/search operations in the TableHeap (include/table/table_heap.h), as well as interactions between the log manager, buffer manager, transaction manager, and checkpoint manager.
Task #1 - Log Manager
To achieve the goal of atomicity and durability, the database system must output to stable storage information describing the modifications made by any transaction. This information ensures that all modifications performed by committed transactions are reflected in the database (perhaps during the course of recovery actions after a crash). It can also help the DBMS ensure that no modifications made by an aborted or crashed transaction persist in the database. The most widely used structure for recording database modifications is the write-ahead log. The log is a sequence of log records, recording all the update activities in the database. We have provided with you the basic structure of log record(include/logging/log_record.h) and corresponding helper methods.
In your implementation, there is a global LogManager
object for the entire system (similar to your BufferPoolManager
). The TablePage
class will explicitly create a log record (before any update operations) and invoke SerializeLogRecord
method of LogManager
to write it into log buffer when the global variable ENABLE_LOGGING
(include/common/config.h) flag is set to be true.
We recommend you to read this article to refresh your C++ concurrency knowledge. More detailed documentation about condition variable is available here.
Implementation Details for Logging/Recovery
This list points out some implementation details that have been set up in advance to enable your logging/recovery system.
- Transaction Manager:
(NEW) We have added a RWLatch in the
TransactionManager
calledglobal_txn_latch_
. This is used in the two helper functions for stopping the execution of transactions (BlockAllTransactions
,ResumeTransactions
). - Global Variables
There is a global variable called
ENABLE_LOGGING
within config.h file. Only when the variable is set to be true can you enable all the logging functionalities. You may need to explicitly set it to be false in order to pass some of the previous test cases. TheLOG_TIMEOUT
is a constant variable defined within config.h file, for everyLOG_TIMEOUT
seconds, yourLogManager
implementation needs to execute a flush operation (more details in the next section). - Buffer Pool Manager:
BufferPoolManager's constructor accepts a DiskManager and LogManager as input parameters. The default value for log_manager is
nullptr
, and log_manager is only valid whenENABLE_LOGGING
is set to be true. We've also added two simple getter functions: Page* GetPages() and size_t GetPoolSize(). These are used to test your CheckpointManager. - Disk Manager:
The DiskManager creates the database file as well as log file inside its constructor. We have also provided separate helper functions
WriteLog
andReadLog
to support accessing the log file. - BPlusTreePage:
There is a member variable
lsn_
to record page log sequence number. You do not need to update lsn within your index implementation, we only test logging and recovery functionalities on table page level. - Page:
The
Page
object hasGetLSN
andSetLSN
helper functions. Remember log sequence number is a 4-byteint32_t
variable that is stored within the Pages'sdata_
array. - TablePage:
We extended
TablePage
to make appropriate calls to the LogManager on insert, update, and delete operations. Please make sure to review these changes and make sure you understand what they're doing. - Transaction:
The
Transaction
object has theGetPrevLSN
andSetPrevLSN
helper functions. Each transaction is responsible for maintaining previous log sequence number for undo purpose in recovery. Please consult Chapters 16.8 in the textbook for more detailed information. - LogRecord:
We have provided the implementation of physiological LogRecord that support different type of write operations within database system. Each log type corresponds to a write operation within
TablePage
class (page/table_page.h, please make sure you understand the record structure before implementation.
REQUIREMENTS AND HINTS
The files you need to modify for this task are the LogManager
class (logging/log_manager.cpp and include/logging/log_manager.h), LogRecovery
class (logging/log_recovery.cpp and include/logging/log_recovery.h), TransactionManaer
class (concurrency/transaction_manager.cpp and include/concurrency/transaction_manager.h), CheckpointManager
class (logging/checkpoint_manager.cpp and include/logging/checkpoint_manager.h, and your original implementation of BufferPoolManager
class(buffer/buffer_pool_manager.cpp and include/buffer/buffer_pool_manager.h).
You will need to complete the following functionalities:
- Inside
RunFlushThread
function inLogManager
, you need to start a separate background thread which is responsible for flushing the logs into disk file. The thread is triggered everyLOG_TIMEOUT
seconds or when the log buffer is full. Since yourLogManager
need to perform asynchronous I/O operations, you will maintain two log buffers, one for flushing (We will call it asflush_buffer
) one for concurrently appending log records (We will call it aslog_buffer
). And you may consider swap buffers under following three situations. (1) When log_buffer is full. (2) WhenLOG_TIMEOUT
is triggered. (3) When buffer pool is going to evict a dirty page from LRU replacer. - Your
LogManager
will integrate the group commit feature. Motivation behind group commit is to amortize the costs of each fsync() over multiple commits from multiple parallel transactions. If there are say 10 transactions in parallel trying to commit, we can force all of them to disk at once with a singlefsync()
call, rather than do one fsync() for each. This can greatly reduce the need forfsync()
calls, and consequently greatly improve the commits-per-second throughput. WithinTransactionManager
, whenever you callCommit
orAbort
method, you need to make sure your log records are permanently stored on disk file before release the locks. But instead of forcing flush, you need to wait forLOG_TIMEOUT
or other operations to implicitly trigger the flush operations. - Before your buffer pool manager evicts a dirty page from LRU replacer and write this page back to db file, it needs to flush logs up to
pageLSN
. You need to comparepersistent_lsn_
(a member variable maintained inLogManager
) with yourpageLSN
. However unlike group commit, buffer pool can force the log manager to flush log buffer, but still needs to wait for logs to be permanently stored before continue. - Add corresponding logics within
TablePage
class methods to deal with run-time WAL logging. You need to (1) explicitly create a log record (include/logging/log_record.h) (2) invokeSerializeLogRecord
method ofLogManager
to write it into log_buffer when the global variableENABLE_LOGGING
(include/common/config.h) is set to be true. (3) Update prevLSN for current transaction. (4) Update LSN for current page
Important: You should first take a look at the files we mention in this and previous sections to become familiar with the APIs and member variables we provide. You should consult with Chapters 16.8 in the textbook and make sure that your implementation follows the ARIES.(Except for the check-pointing part). Since LogManager
need to deal with background thread and thread synchronization stuff, we recommend you to take a look at Future and Promise.
Task #2 - System Recovery
The next part of the project is to implement the ability for the DBMS to recover its state from the log file.
REQUIREMENTS AND HINTS
The recovery process for our database system is pretty straightforward. Since we do not support fuzzy checkpoints, there is no need for analysis pass. The only file you need to modify for this task is the LogRecovery
class (logging/log_recovey.cpp and include/logging/log_recovey.h). You will need to implement the following functions:
DeserializeLogRecord
: Deserialize and reconstruct a log record from log buffer. If the return value is true then the deserialization step was successful, otherwise log buffer may contain incomplete log record.Redo
: Redo pass on TABLE PAGE level(include/table/table_page.h). Read the log file from the beginning to end (you must prefetch log records into buffer to reduce unnecessary I/O operations), for each log record, redo the action unless page is already more up-to-date than this record. Also buildactive_txn_ table
andlsn_mapping_ table
along the way.Undo
: Undo pass on TABLE PAGE level(include/table/table_page.h). Iterate throughactive_txn_ table
and undo every operations within each transaction. You do not need to worry about crash during recovery process, thus no complementary logging is needed.
Important: We recommend that you read Chapters 16.8 in the textbook first to make sure that you understand the whole process of recovery and what is redo/undo pass trying to do. Then figure out each write operation within class TablePage
and what is the corresponding complementary operation. For example, to undo an Insert operation, you need to call ApplyDelete
function instead of MarkDelete
.
Task #3 - Checkpoints
For the final task, you will implement a simple mechanism to make fully consistent checkpoints. We've added a new module called CheckpointManager that is responsible for handling this. It should work directly with the TransactionManager, BufferPoolManager, and LogManager to block any new transactions, flush the WAL to disk, and flush all dirty buffer pool pages to disk. Finally, it should allow new transactions to begin.
REQUIREMENTS AND HINTS
The API for the CheckpointManager has two functions:
BeginCheckpoint
: Use theTransactionManager
to block any new transactions from starting, then use theLogManager
to ensure the contents of the WAL in memory is flushed to disk. Lastly, use theBufferPoolManager
to flush all dirty pages to disk. Note that the DBMS must keep all transactions blocked when this method returns (this is necessary for grading).EndCheckpoint
: Use theTransactionManager
to unblock transactions and allow new transactions to begin.
Adding checkpointing to your DBMS should be the simplest of the three tasks as it does not require a lot of code. You just need to extend your BufferPoolManager
to flush all dirty pages to disk and mark them clean. With this in place, your BeginCheckpoint
and EndCheckpoint
methods should be short.
Instructions
Setting Up Your Development Environment
You must be on campus or use the CMU VPN to download the source code.
Download the project source code here. This version has additional files that were added after project #3 so you need to update your local copy.
Make sure that your source code has the following VERSION.txt file:
15-445/645 Project Source Code ------------------------------------------------------------ Created: Nov 26 2018 @ 05:40:47 Last Commit: f906fe48c7495cb1a4a50eaf8e4f2fc47842422e
Testing
You can test the individual components of this assignment using our testing framework. We use GTest for unit test cases. You can compile and run each test individually from the command-line:
cd build make log_manager_test ./test/log_manager_test
In the log_manager_test, we will first start the system, create a table, populate several tuples and then shut down the system. When the system restarts and has completed the recovery phases, it should be back to the consistent state before crash.
Important: These tests are only a subset of the all the tests that we will use to evaluate and grade your project. You should write additional test cases on your own to check the complete functionality of your implementation. Given the simplicity of our checkpointing scheme, we will not release any local tests for it.
Development Hints
To ensure your implementation does not have memory leak, you can run the test with Valgrind.
cd build make lock_manager_test valgrind --trace-children=yes \ --leak-check=full \ --track-origins=yes \ --soname-synonyms=somalloc=jemalloc \ --error-exitcode=1 \ --suppressions=../third_party/valgrind/valgrind.supp \ ./test/log_manager_test
Instead of using printf
statements for debugging, use the LOG_*
macros for logging information like this:
LOG_INFO("# RID: %s", rid->ToString().c_str()); LOG_DEBUG("Evict page %d", page_id);
To enable logging in your project, you will need to reconfigure it like this:
cd build cmake -DCMAKE_BUILD_TYPE=DEBUG .. make
The different logging levels are defined in src/include/common/logger.h. After enabling logging, the logging level defaults to LOG_LEVEL_INFO
. Any logging method with a level that is equal to or higher than LOG_LEVEL_INFO
(e.g., LOG_INFO
, LOG_WARN
, LOG_ERROR
) will emit logging information.
Using assert to force check the correctness of your implementation. For example, when you try to delete a page, the page_count
must be equal to 0. And when you try to unpin a page, the page_count
must be larger than 0.
Using a relatively small value of page size at the beginning test stage, it would be easier for you to check whether you have done the logging & recovery in the correct way. You can change the page size in configuration file (src/include/common/config.h).
Grading Rubric
Each project submission will be graded based on the following criteria:
- Does the submission successfully execute all of the test cases and produce the correct answer?
- Does the submission execute without any memory leaks?
Note that we will use additional test cases that are more complex and go beyond the sample test cases that we provide you.
Late Policy
See the late policy in the syllabus.
Submission
After completing the assignment, you can submit your implementation to Gradescope:
You only need to include the following files:
- Every file for Project 1 (6 in total) (this includes buffer_pool_manager.h/cpp)
- Every file for Project 2 (10 in total)
- Every file for Project 3 (2 in total)
- src/logging/log_manager.cpp
- src/include/logging/log_manager.h
- src/logging/log_recovery.cpp
- src/include/logging/log_recovery.h
- src/logging/checkpoint_manager.cpp
- src/include/logging/checkpoint_manager.h
- src/concurrency/transaction_manager.cpp
- src/include/concurrency/transaction_manager.h
You can submit your answers as many times as you like and get immediate feedback. Your score will be sent via email to your Andrew account within a few minutes after your submission.
Collaboration Policy
- Every student has to work individually on this assignment.
- Students are allowed to discuss high-level details about the project with others.
- Students are not allowed to copy the contents of a white-board after a group meeting with other students.
- Students are not allowed to copy the solutions from another colleague.
WARNING: All of the code for this project must be your own. You may not copy source code from other students or other sources that you find on the web. Plagiarism will not be tolerated. See CMU's Policy on Academic Integrity for additional information.