Distributed Transactions

Distributed Transactions in YugabyteDB.

Overview of transaction control

This section explains how a distributed transaction works in YugabyteDB. We will use the example table below to describe the control flow of a simple transaction.

CREATE TABLE accounts (
  account_name varchar,
  account_type varchar,
  balance float,
  PRIMARY KEY (account_name, account_type)
);

Insert some sample data into the table.

INSERT INTO accounts VALUES ('John', 'savings', 1000);
INSERT INTO accounts VALUES ('John', 'checking', 100);
INSERT INTO accounts VALUES ('Smith', 'savings', 2000);
INSERT INTO accounts VALUES ('Smith', 'checking', 50);

The table should look as follows:

yugabyte=# select * from accounts;
 account_name | account_type | balance
--------------+--------------+---------
 John         | checking     |     100
 John         | savings      |    1000
 Smith        | checking     |      50
 Smith        | savings      |    2000
(4 rows)

Now, we will run the following transaction and explain what happens at each step.

BEGIN TRANSACTION;
  UPDATE accounts SET balance = balance - 200
      WHERE account_name='John' AND account_type='savings';
  UPDATE accounts SET balance = balance + 200
      WHERE account_name='John' AND account_type='checking';
COMMIT;
Command Description

BEGIN TRANSACTION;
    
The node that receives this statement becomes the transaction coordinator. A new transaction record is created in the transaction status table for the current transaction. It has a unique transaction id with the state PENDING. Note that in practice, these records are pre-created to achieve high performance.

UPDATE accounts SET balance = balance - 200
  WHERE account_name='John'
  AND account_type='savings';
    
The transaction coordinator writes a *provisional record* to the tablet that contains this row. The provisional record consists of the transaction id, so the state of the transaction can be determined. If there already exists a provisional record written by another transaction, then the current transaction would use the transaction id that is present in the provisional record to fetch details and check if there is a potential conflict.

UPDATE accounts SET balance = balance + 200
  WHERE account_name='John'
  AND account_type='checking';
    
This step is largely the same as the previous step. Note that the rows being accessed can live on different nodes. The transaction coordinator would need to perform a provisional write RPC to the appropriate node for each row.

COMMIT;
    
Note that in order to COMMIT, all the provisional writes must have successfully completed. The COMMIT statement causes the transaction coordinator to update the transaction status in the transaction status table to COMMITED, at which point it is assigned the commit timestamp (which is a *hybrid timestamp* to be precise). At this point, the transaction is completed. In the background, the COMMIT record along with the commit timestamp is applied to each of the rows that participated to make future lookups of these rows efficient.

This is shown diagrammatically below. distributed_txn_write_path

After the above transaction succeeds, the table should look as follows.

yugabyte=# select * from accounts;
 account_name | account_type | balance
--------------+--------------+---------
 John         | checking     |     300
 John         | savings      |     800
 Smith        | checking     |      50
 Smith        | savings      |    2000
(4 rows)

Scalability

Since all nodes of the cluster can process transactions by becoming transaction coordinators, horizontal scalability can simply be achieved by distributing the queries evenly across the nodes of the cluster.

Resilience

Each update performed as a part of the transaction is replicated across multiple nodes for resilience and persisted on disk. This ensures that the updates made as a part of a transaction that has been acknowledged as successful to the end user is resilient even if failures occur.

Concurrency control

Concurrency control in databases ensures that multiple transactions can execute concurrently while preserving data integrity. Concurrency control is essential for correctness in environments where two or more transactions can access the same data at the same time. The two primary mechanisms to achieve concurrency control are optimistic and pessimistic.

Note

YugabyteDB currently supports optimistic concurrency control, with pessimistic concurrency control being worked on actively.

Transaction Options

You can see the various options supported by transactions by running the \h BEGIN statement, as shown below.

yugabyte=# \h BEGIN
Command:     BEGIN
Description: start a transaction block
Syntax:
BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ]

where transaction_mode is one of:

  ISOLATION LEVEL { SERIALIZABLE | REPEATABLE READ | READ COMMITTED | READ UNCOMMITTED }
  READ WRITE | READ ONLY
  [ NOT ] DEFERRABLE

transaction_mode

The transaction_mode can be set to one of the following options:

  1. READ WRITE – Users can perform read or write operations.
  2. READ ONLY – Users can perform read operations only.

As an example, trying to do a write operation such as creating a table or inserting a row in a READ ONLY transaction would result in an error as shown below.

yugabyte=# BEGIN READ ONLY;
BEGIN

yugabyte=# CREATE TABLE example(k INT PRIMARY KEY);
ERROR: 25P02: current transaction is aborted, commands ignored until end of
              transaction block

DEFERRABLE transactions

The DEFERRABLE transaction property in YSQL is similar to PostgreSQL in that has no effect unless the transaction is also SERIALIZABLE and READ ONLY.

When all three of these properties (SERIALIZABLE, DEFERRABLE and READ ONLY) are set for a transaction, the transaction may block when first acquiring its snapshot, after which it is able to run without the normal overhead of a SERIALIZABLE transaction and without any risk of contributing to or being canceled by a serialization failure.

Tip

This mode is well suited for long-running reports or backups without being impacting or impacted by other transactions.