The idea of ACID transactions is a powerful way to design database- using application programs, but this depends on having correct and efficient concurrency control algorithms in the dbms engine. Traditional engines use strict two-phase locking, which guarantees correct (serializable) execution, but often suffers from low throughput. Snapshot Isolation (SI) is a multiversion concurrency control mechanism that is widely available (for example, in Oracle, PostgreSQL and MSSQL Server). It generally gets good performance, and it avoids the classical interleaving anomalies such as Lost Update or Inconsistent Reads. However, unlike traditional two-phase locking or timestamp concurrency control, SI does not guarantee that arbitrary executions are serializable. In particular, it is possible for a concurrent execution to violate undeclared data integrity constraints, even though each application program preserves the constraints when run alone in the system.
In this talk we present a body of work that studies different ways how to ensure serializable executions, when executing on a platform offering SI. The key to this is a theory that shows how a particular set of transactions may have a pattern of data-conflicts which prevents non-serializable execution. This pattern can be detected by drawing a graph (a variant of the usual serializability graph for an execution) and checking for a special feature, called a dangerous structure. If a set of programs does not have a dangerous structure in its graph, then all executions will be serializable.
Given a set of programs, there are a number of ways we can modify the programs to remove all the dangerous structures. These modifications introduce extra conflicts between certain pairs of transactions while not changing the meaning of any program, for example, by adding SQL statements to update a table not used for any other purpose. We have studied the performance of these modifications, and we find that some choices give performance almost as good as for the unmodified programs, as well as removing all possibility of concurrency anomalies.
Another approach is to change the dbms engine, to introduce dynamic checks that prevent anomalies. We find that despite these checks, the variant engine can keep most of the performance of SI, while guaranteeing true serializable execution. In contrast to program modification, this doesn't require pre-knowledge of the transactions that will run in the system, but it does require access to the source of the engine.
This talk is based on a series of papers which appeared in TODS, PODS'05, VLDB'07, DASFAA'08, AICCSA'08, ICDE'08 and SIGMOD'08 (this latter paper was given a "Best Paper" award at the conference). The coauthors of these papers are Mohammad Alomari, Sudhir Jorwekar, Dimitrios Liarokapis, Patrick O'Neil, Elizabeth O'Neil, Krithi Ramamritham, Uwe Roehm, Dennis Shasha and S. Sudarshan (from Sydney University, UMass Boston, IIT Bombay and NYU).