Project #3 - Concurrency Control

Do not post your project on a public Github repository.

Overview

The third programming project is to implement a lock manager in your database system. A lock manager is responsible for keeping track of the tuple-level locks issued to transactions and supporting shared & exclusive lock grant and release.

The project is comprised of the following three tasks:

Task #1 - Lock Manager
Task #2 - Deadlock Prevention
Task #3 - Deadlock Detection

This is a single-person project that will be completed individually (i.e., no groups).

Please post all of your questions about this project on Piazza. Do not email the TAs directly with questions. The instructor and TAs will not debug your code. Particularly with this project, we will not debug deadlocks.

Release Date: Oct 24, 2018
Due Date: Nov 19, 2018 @ 11:59pm

Project Specification

Like the first project, we are providing you with stub classes that contain the API that you need to implement. You should not modify the signatures for the pre-defined functions in these classes. If you do this, then it will break the test code that we will use to grade your assignment you end up getting no credit for the project. If a class already contains certain member variables, you should not remove them. But you may add private helper functions/member variables to these classes in order to correctly realize the functionality.

The correctness of this project depends on the correctness of your implementation of previous projects, we will not provide solutions or binary files.

Task #1 - Lock Manager

To ensure correct interleaving of transactions' operations, the DBMS will use a lock manager (LM) to control when transactions are allowed to access data items. The basic idea of a LM is that it maintains an internal data structure about the locks currently held by active transactions. Transactions then issue lock requests to the LM before they are allowed to access a data item. The LM will either grant the lock to the calling transaction, block that transaction, or abort it.

In your implementation, there will be a global LM for the entire system (similar to your buffer pool manager). The TableHeap class will use your LM to acquire locks on tuple records (by record id RID) whenever a transaction wants to access/modify a tuple. The DBMS will only support REPEATABLE READ isolation level (i.e., you do not need to support range-locks so it cannot support SERIALIZABLE isolation).

This task requires you to implement a tuple-level LM that supports both two-phase locking (2PL) and strict two-phase locking (S2PL), as well as deadlock prevention and detection. The system will pass in a flag to the LM to say what version of 2PL or deadlock handling it should use during its initialization. NOTE behavior for shared and exclusive locks vary based for S2PL. 15.1.3 dives more into this in the textbook. Once you have built a basic lock manager, you will add on deadlock prevention in Task #2 and deadlock detection in Task #3. While we wont test this version of your lock manager (no deadlock handling), it helps to build the infrastructure first and then add on the deadlock handling after. In the repository, we are providing you with a Transaction context handle (include/concurrency/transaction.h) that maintains the current state of the transaction (i.e., GROWING, SHRINKING, ABORTED, COMMITTED) and information about its acquired locks. The LM will need to check/update the state of transaction correctly. Any failed lock operation should return false and change the transaction state to ABORTED (implicit abort).

We recommend you to read this article to refresh your C++ concurrency knowledge. More detailed documentation is available here.

REQUIREMENTS AND HINTS

The only file you need to modify for this task is the LockManager class (concurrency/lock_manager.cpp and concurrency/lock_manager.h). You will need to implement the following functions:

LockShared(Transaction, RID): Transaction txn tries to take a shared lock on record id rid. This should be blocked on waiting and should return true when granted. Return false if transaction is rolled back (aborts).
LockExclusive(Transaction, RID): Transaction txn tries to take an exclusive lock on record id rid. This should be blocked on waiting and should return true when granted. Return false if transaction is rolled back (aborts).
LockUpgrade(Transaction, RID): Transaction txn tries to upgrade a shared to exclusive lock on record id rid. This should be blocked on waiting and should return true when granted. Return false if transaction is rolled back (aborts). This should also abort the transaction and return false if another transaction is already waiting to upgrade their lock.
Unlock(Transaction, RID): Unlock the record identified by the given record id that is held by the transaction. There is only a tiny corner case you will fail on Unlock method (return false). (Hint: Think about the differences between 2PL and strict 2PL)

You should first take a look at the transaction.h and lock_manager.h to become familiar with the API and member variables we provide. You have the freedom of adding any necessary data structures in lock_manager.h. You should consult with Chapters 15.1-15.2 in the textbook and make sure that your implementation follows the 2PL protocol carefully.

Advice from the TAs

While your Lock Manager needs to use either deadlock prevention or detection, we recommend implementing your lock manager first without any deadlock handling, and then adding detection and prevention after you verify it correctly locks and unlocks when no deadlocks occur.
You will need some way to keep track of which transactions are waiting on a lock. We recommend using an ordered data structure of your choice.
You will need some way to notify transactions that are waiting when they may be up to grab the lock. We recommend using std::condition_variable

COMMON PITFALLS

The LM's constructor will be told whether it should follow strict 2PL or not, and whether to use deadlock detection or prevention.
You should also maintain state of transaction. For example, the states of transaction may be changed from GROWING phase to SHRINKING phase due to unlock operation (Hint: Look at the methods in transaction.h)
You should also keep track of the shared/exclusive lock acquired using shared_lock_set_ and exclusive_lock_set_ so that when the TransactionManager wants to commit/abort a transaction, the LM can release them properly.
Setting a transaction's state to ABORTED implicitly aborts it, but it is not explicitly aborted until TransactionManager::Abort is called. You should read through this function to understand what it does, and how your lock manager is used in the abort process. Your lock manager should only implicitly abort the transaction. When testing your code, your test cases should explicitly abort the transaction (using the transaction manager) after the lock manager has implicitly aborted it.

Task #2 - Deadlock Prevention

If your lock manager is told to use prevention, your lock manager should use the WOUND-WAIT algorithm for deciding which transactions to abort.

ADVICE

Read through the slides carefully on how Wound-Wait is implemented.
When you abort a transaction, make sure to properly set its state using SetState

Task #3 - Deadlock Detection

If your lock manager is told to use deadlock detection, your lock manager should run deadlock detection in the background to abort blocking transactions. More precisely, this means that a background thread should periodically build a waits-for graph on the fly and break any cycles. This means that you should not actively update the waits-for graph as transactions acquire and release locks.

The graph API you must implement and use for your cycle detection along with testing is the following:

AddEdge(txn_id_t t1, txn_id_t t2): Adds an edge in your graph from t1 to t2. If the edge already exists, you don't have to do anything.
RemoveEdge(txn_id_t t1, txn_id_t t2): Removes edge t1 to t2 from your graph. If no such edge exists, you don't have to do anything.
HasCycle(txn_id_t& txn_id): Looks for a cycle by using the Depth First Search (DFS) algorithm. If it finds a cycle, HasCycle should store the transaction id of the youngest transaction in the cycle in txn_id and return true. Your function should return the first cycle it finds. If your graph has no cycles, HasCycle should return false.
GetEdgeList(): Returns a list of tuples representing the edges in your graph. We will use this to test correctness of your graph. A pair (t1,t2) corresponds to an edge from t1 to t2.
RunCycleDetection(): Contains skeleton code for running cycle detection in the background. You should implement your cycle detection logic in here.

How you choose to implement the graph is up to you as long as your implementation supports the above API. Those functions are what we will use to test your code.

REQUIREMENTS AND HINTS

Your background thread should build the graph on the fly every time it wakes up. You should not be maintaining a graph, it should be built and destroyed every time the thread wakes up.
Your DFS Cycle detection algorithm must be deterministic. In order to do achieve this, you must always choose to explore the lowest transaction id first. This means when choosing which unexplored node to run DFS from, always choose the node with the lowest transaction id. This also means when exploring neighbors, explore them in sorted order from lowest to highest.
When you find a cycle, you should abort the youngest transaction to break the cycle by setting that transactions state to ABORTED.
When your detection thread wakes up, it is responsible for breaking all cycles that exist. If you follow the above requirements, you will always find the cycles in a deterministic order. This also means that when you are building your graph, you should not add nodes for aborted transactions or draw edges to aborted transactions.
Your background cycle detection algorithm may need to get a pointer to a transaction using a txn_id. We have added a static method Transaction* GetTransaction(txn_id_t txn_id) to let you do that.
You can use std::this_thread::sleep_for to order threads to write test cases. You can also tweak CYCLE_DETECTION_INTERVAL in common/config.h in your test cases.

COMMON PITFALLS

Remember that a waits-for graph is a directed graph.
A waits-for graph draws edges when a transaction is waiting for another transaction. Remember that if multiple transactions hold a shared lock, a single transaction may be waiting on multiple transactions.
When a transaction's state is set to aborted, make sure to call TransactionManager::Abort in your test code to explicitly abort the transaction.
A transaction waiting for a lock may be aborted by the background cycle detection thread. You must have a way to notify waiting transactions that they've been aborted.

Instructions

Setting Up Your Development Environment

You must be on campus or use the CMU VPN to download the source code.

Download the project source code here. This version has additional files that were added or modified after project #2 so you need to update your local copy.

Make sure that your source code has the following VERSION.txt file:

Created: Oct 30 2018 @ 01:13:38
Last Commit: ac6f5f861941fcae92df066633c64f8155baef4a

Testing

You can test the individual components of this assignment using our testing framework. We use GTest for unit test cases. You can compile and run each test individually from the command-line:

cd build
make lock_manager_test
./test/lock_manager_test

Important: These tests are only a subset of the all the tests that we will use to evaluate and grade your project. You should write additional test cases on your own to check the complete functionality of your implementation.

Development Hints

To ensure your implementation does not have memory leak, you can run the test with Valgrind.

cd build
make lock_manager_test
valgrind --trace-children=yes \
   --leak-check=full \
   --track-origins=yes \
   --soname-synonyms=somalloc=jemalloc \
   --error-exitcode=1 \
   --suppressions=../third_party/valgrind/valgrind.supp \
   ./test/lock_manager_test

Instead of using printf statements for debugging, use the LOG_* macros for logging information like this:

LOG_DEBUG("Set Transaction %d's state to ABORTED", txn_id);

To enable logging in your project, you will need to reconfigure it like this:

cd build
cmake -DCMAKE_BUILD_TYPE=DEBUG ..
make

The different logging levels are defined in src/include/common/logger.h. After enabling logging, the logging level defaults to LOG_LEVEL_INFO. Any logging method with a level that is equal to or higher than LOG_LEVEL_INFO (e.g., LOG_INFO, LOG_WARN, LOG_ERROR) will emit logging information.

Use assert to force check the correctness of your implementation.

Grading Rubric

Each project submission will be graded based on the following criteria:

Does the submission successfully execute all of the test cases and produce the correct answer?
Does the submission execute without any memory leaks?

Note that we will use additional test cases that are more complex and go beyond the sample test cases that we provide you.

Late Policy

See the late policy in the syllabus.

Submission

After completing the assignment, you can submit your implementation to Gradescope. You only need to include the following files:

Every file for Project 1 (6 in total)
Every file for Project 2 (10 in total)
src/concurrency/lock_manager.cpp
src/include/concurrency/lock_manager.h

You can submit your answers as many times as you like and get immediate feedback.

Collaboration Policy

Every student has to work individually on this assignment.
Students are allowed to discuss high-level details about the project with others.
Students are not allowed to copy the contents of a white-board after a group meeting with other students.
Students are not allowed to copy the solutions from another colleague.

WARNING: All of the code for this project must be your own. You may not copy source code from other students or other sources that you find on the web. Plagiarism will not be tolerated. See CMU's Policy on Academic Integrity for additional information.

Last Updated: Nov 22, 2018