ORMs Should Update “Changed” Values, Not Just “Modified” Ones

In this article, I will establish how the SQL language and its implementations distinguish between changed values and modified values, where a changed value is a value that has been “touched”, but not necessarily modified, i.e. the value might be the same before and after the change.

Many ORMs, unfortunately, either update all of a record’s values, or only the modified ones. The first can be inefficient, and the latter can be wrong. Updating the changed values would be correct.

Note that you may have a different definition of changed and modified. For this article, let’s just assume that the above definition is as valid as it is useful.

Introduction

A very interesting discussion was triggered recently by Vlad Mihalcea who was looking for an answer to this interesting question:

What’s the overhead of updating all columns, even the ones that haven’t changed?

Apart from the question being very interesting from a performance perspective, the tweet also inspired functional aspects of a distinction between updating all columns vs. updating some columns, which I’ll summarise in this article.

What’s the Problem?

The problem is one that all ORM vendors need to solve: ORMs have a client side representation of the relational model, and that representation is cached (or “out of sync”) for a user to change and then persist again. The problem is now how to re-synchronise the client side representation with the server side representation in a consistent and correct way.

Sidenote: By ORM I understand any tool that maps from a client side representation of your database schema to the database schema itself, regardless if the product supports full-fledged JPA-style object graph persistence, or “merely” implements an “active record” pattern, such as jOOQ 3.x (I find that distinction a bit academic).

All such ORMs have a client side representation of a database record, for instance given the following table (I’m going to be using PostgreSQL syntax):

CREATE TABLE customer (
  customer_id SERIAL8     NOT NULL PRIMARY KEY,
  first_name  VARCHAR(50) NOT NULL,
  last_name   VARCHAR(50) NOT NULL
)

You’re going to have a client side representation as the following (using Java, e.g. jOOQ or JPA):

// jOOQ generated UpdatableRecord
public class CustomerRecord 
extends UpdatableRecordImpl<CustomerRecord> {

  public CustomerRecord setCustomerId(Long customerId) { ... }
  public Long getCustomerId() { ... }
  public CustomerRecord setFirstName(String firstName) { ... }
  public String getFirstName() { ... }

  ...
}

// JPA annotated entity
@Entity
public class Customer {

  @Id
  @GeneratedValue(strategy = IDENITITY)
  public long customerId;

  @Column
  public String firstName;

  ...
}

In principle, these two approaches are the same thing with the distinction that jOOQ explicitly governs all UpdatableRecord interactions through type inheritance, whereas JPA makes this dependency more implicit through annotations:

  • jOOQ – explicit behavioural dependency between entity and jOOQ logic
  • JPA – implicit behavioural dependency between entity and JPA entity manager

In principle, the distinction is just a matter of taste, a programming style: Explicit vs. declarative.

But from a practical perspective, the JPA implementation lacks an important feature when it comes to synching the state back to the database. It cannot reflect change, only modification.

How to synch the state back to the database?

Let’s assume we have a customer called John Doe:

INSERT INTO customer (first_name, last_name)
VALUES ('John', 'Doe');

And that customer now changes their names to John Smith. We have several options of sending that update to the database, through “PATCH” or “PUT” semantics – terminology used by Morgan Tocker in another tweet in that discussion:

-- PATCH
UPDATE customer SET last_name = 'Smith' WHERE id = ? 

-- PUT
UPDATE customer 
SET first_name = 'John',
    last_name = 'Smith'
WHERE customer_id = ? 

A “PATCH” operation sends only the changed values back to the server, whereas a “PUT” operation sends the entire entity back to the server.

Discussion – Semantics.

In favour of PUT

The two operations are semantically very different. If another session attempts to rename this customer to Jane Doe concurrently (and without optimistic locking being in place), then the PATCH operation might result in an inconsistent outcome (Jane Smith), whereas the PUT operation would still produce one of the expected results, depending on what write is executed first:

-- PATCH result: Jane Smith
-- PATCH 1
UPDATE customer SET last_name = 'Smith' WHERE customer_id = ? 

-- PATCH 2
UPDATE customer SET first_name = 'Jane' WHERE customer_id = ? 

-- PUT result: Jane Doe
-- PUT 1
UPDATE customer 
SET first_name = 'John',
    last_name = 'Smith'
WHERE customer_id = ? 

-- PUT 2
UPDATE customer 
SET first_name = 'Jane',
    last_name = 'Doe'
WHERE customer_id = ? 

This is one of the reasons why Hibernate, as a JPA implementation, always implements PUT semantics by default, sending all the columns at once. You can opt out of this by using the @DynamicUpdate, which will only update modified values (not “changed” values, I’ll explain this distinction later).

This makes perfect sense in such a trivial setup, but it is a short-sighted solution, when the table has many more columns. We’ll see right away why:

In favour of PATCH

One size doesn’t fit all. Sometimes, you do want concurrent updates to happen, and you do want to implement PATCH semantics, because sometimes, two concurrent updates do not work against each other. Take the following example using an enhancement of the customer table.

Business is asking us to collect some aggregate metrics for each customer. The number of clicks they made on our website, as well as the number of purchases they made:

CREATE TABLE customer (
  customer_id SERIAL8     NOT NULL PRIMARY KEY,
  first_name  VARCHAR(50) NOT NULL,
  last_name   VARCHAR(50) NOT NULL,

  clicks      BIGINT      NOT NULL DEFAULT 0,
  purchases   BIGINT      NOT NULL DEFAULT 0
)

And, of course, once you agree that the above design is a suitable one, you’ll immediately agree that here, PATCH semantics is more desirable than PUT semantics:

-- Updating clicks
UPDATE customer SET clicks = clicks+1 WHERE customer_id = ? 

-- Updating purchases
UPDATE customer SET purchases = purchases+1 WHERE customer_id = ? 

Not only do we update only an individual column, we’re doing it entirely in SQL, including the calculation. With this approach, we do not even need optimistic locking to guarantee update correctness, as we’re not using any client side cached version of the customer record, which could be out of date and would need optimistic (or worse: pessimistic) locking.

If we implemented this differently, using client side calculation of the updated clicks / purchases counters…

-- Updating clicks
UPDATE customer 
SET clicks = ? 
WHERE customer_id = ? 

-- Updating purchases
UPDATE customer 
SET purchases = ? 
WHERE customer_id = ? 

… then we’d need one of these techniques:

  • Pessimistic locking: Nope, won’t work. We could still get incorrect updates
  • Optimistic locking: Indeed, any update would need to be done on a versioned customer record, so if there are two concurrent updates, one of them will fail and could try again. This guarantees data integrity, but will probably make this functionality very painful, because a lot of click updates are probably done in a short amount of time, and they would need to be repeated until they work!
  • Client side synchronisation: Of course, we could prevent concurrency for these updates on the client side, making sure that only one concurrent process ever updates click counts (for a given customer). We could implement a click count update queue for this.

All of the above options have significant drawbacks, the easiest solution is really to just increment the counter directly in the database.

And don’t forget, if you choose a bind-variable based solution, and opt for updating ALL the columns, rather than just the changed one, your first_name / last_name updates might conflict with these counter updates as well, making things even more complicated.

Partial PUT (or compound PATCH)

In fact, from a semantics perspective, if you do want to use an ORM to update an entity, you should think about a “partial PUT” semantics, which separates the different entity elements in “sub entities”. From a relational perspective, of course, no such thing as a subentity exists. The above example should be normalised into this, and we would have much less concurrency issues:

CREATE TABLE customer (
  customer_id SERIAL8     NOT NULL PRIMARY KEY,
  first_name  VARCHAR(50) NOT NULL,
  last_name   VARCHAR(50) NOT NULL
);

CREATE TABLE customer_clicks
  customer_id BIGINT NOT NULL PRIMARY KEY REFERENCES customer,
  clicks      BIGINT NOT NULL DEFAULT 0
);

CREATE TABLE customer_purchases
  customer_id BIGINT NOT NULL PRIMARY KEY REFERENCES customer,
  purchases   BIGINT NOT NULL DEFAULT 0
);

This way, the previously mentioned PUT semantics would not create situations where individual, semantically unrelated updates (updates to names, updates to clicks) would interfere with each other. We would only need to make sure that e.g. two competing updates to clicks are correctly serialised.

Practically, we often don’t design our databases this way, either for convenience reasons, for optimised storage, for optimised querying (see also our article when normalisation and surrogate keys hurt performance).

jOOQ’s “changed” value semantics

So that “sub entity” is really just a logical thing, which can be represented either as a logically separate entity in JPA, or we can use jOOQ, which works a bit differently here. In jOOQ, we can change an UpdatableRecord only partially, and that partial change is sent to the server:

CustomerRecord customer = ctx
    .selectFrom(CUSTOMER)
    .where(CUSTOMER.CUSTOMER_ID.eq(customerId))
    .fetchOne();

customer.setFirstName("John");
customer.setLastName("Smith");

assertTrue(customer.changed(CUSTOMER.FIRST_NAME));
assertTrue(customer.changed(CUSTOMER.LAST_NAME));
assertFalse(customer.changed(CUSTOMER.CLICKS));
assertFalse(customer.changed(CUSTOMER.PURCHASES));

customer.store();

assertFalse(customer.changed(CUSTOMER.FIRST_NAME));
assertFalse(customer.changed(CUSTOMER.LAST_NAME));
assertFalse(customer.changed(CUSTOMER.CLICKS));
assertFalse(customer.changed(CUSTOMER.PURCHASES));

This will send the following statement to the server:

UPDATE customer
SET first_name = ?,
    last_name = ?
WHERE customer_id = ?

Optionally, just as with JPA, you can turn on optimistic locking on this statement. The important thing here is that the clicks and purchases columns are left untouched, because they were not changed by the client code. This is different from JPA, which either sends all the values by default, or if you specify @DynamicUpdate in Hibernate, it would send only the last_name column, because while first_name was changed it was not modified.

My definition:

  • changed: The value is “touched”, its state is “dirty” and the state needs to be synched to the database, regardless of modification.
  • modified: The value is different from its previously known value. By necessity, a modified value is always changed.

As you can see, these are different things, and it is quite hard for a JPA-based API like Hibernate to implement changed semantics because of the annotation-based declarative nature of how entities are defined. We’d need some sophisticated instrumentation to intercept all data changes even when the values have not been modified (I didn’t make those attributes public by accident).

Without this distinction, however, it is unreasonable to use @DynamicUpdate in Hibernate, as we might run into that situation we didn’t want to run into, where we get a customer called “Jane Smith” – or we use optimistic locking, in case of which there’s not much point in using @DynamicUpdate.

The database perspective

From a database perspective, it is also important to distinguish between change and modification semantics. In the answer I gave on Stack Exchange, I’ve illustrated two situations:

INSERTs and DEFAULT values

Thus far, we’ve discussed only UPDATE statements, but similar reasoning may be made for INSERT as well. These two statements are the same:

INSERT INTO t (a, b)    VALUES (?, ?);
INSERT INTO t (a, b, c) VALUES (?, ?, DEFAULT);

This one, however, is different:

INSERT INTO t (a, b, c) VALUES (?, ?, ?);

In the first case, a DEFAULT clause (e.g. timestamp generation, identity generation, trigger value generation, etc.) may apply to the column c. In the second case, the value c is provided explicitly by the client.

Languages like Java do not have any way to represent this distinction between

  • NULL (which is usually, but not always, the DEFAULT) in SQL
  • an actual DEFAULT

This can only be achieved when an ORM implements changed semantics, like jOOQ does. When you create a customer with jOOQ, then clicks and purchases will have their DEFAULT applied:

CustomerRecord c1 = ctx.newRecord(CUSTOMER);
c1.setFirstName("John");
c1.setLastName("Doe");
c1.store();

CustomerRecord c2 = ctx.newRecord(CUSTOMER);
c2.setFirstName("Jane");
c2.setLastName("Smith");
c2.setClicks(1);
c2.setPurchases(1);
c2.store();

Resulting SQL:

-- c1.store();
INSERT INTO customer (first_name, last_name)
VALUES (?, ?);

-- c2.store();
INSERT INTO customer (first_name, last_name, clicks, purchases)
VALUES (?, ?, ?, ?);

In both cases, that’s what the user tells jOOQ to do, so jOOQ will generate a query accordingly.

Back to UPDATE statements

Consider the following example using Oracle triggers:

CREATE TABLE x (a INT PRIMARY KEY, b INT, c INT, d INT);

INSERT INTO x VALUES (1, 1, 1, 1);

CREATE OR REPLACE TRIGGER t
  BEFORE UPDATE OF c, d -- Doesn't fire on UPDATE OF b!
  ON x
BEGIN
  IF updating('c') THEN
    dbms_output.put_line('Updating c');
  END IF;
  IF updating('d') THEN
    dbms_output.put_line('Updating d');
  END IF;
END;
/

SET SERVEROUTPUT ON
UPDATE x SET b = 1 WHERE a = 1;
UPDATE x SET c = 1 WHERE a = 1;
UPDATE x SET d = 1 WHERE a = 1;
UPDATE x SET b = 1, c = 1, d = 1 WHERE a = 1;

It results in the following output:

table X created.
1 rows inserted.
TRIGGER T compiled
1 rows updated.
1 rows updated.
Updating c

1 rows updated.
Updating d

1 rows updated.
Updating c
Updating d

As you can see, the trigger doesn’t fire when we update only column b, which it is not interested in. Again, this goes in the direction of distinguishing between changed and modified values, where a trigger fires only when a value is changed (but not necessarily modified).

Now, if an ORM will always update all the columns, this trigger will not work correctly. Sure, we can compare :OLD.b and :NEW.b, but that would check for modification, not change, and it might be costly to do so for large strings!

Speaking of costs…

Performance

Statement caching: Weakly in favour of PUT

While one of the reasons the Hibernate team mentioned in favour of updating all the columns is improved cursor cache performance (fewer distinct SQL statements need to be parsed by the database as there are fewer distinct update configurations), I suggest that this “premature optimisation” is negligible. If a client application runs dynamic updates (in the jOOQ sense, where changed values are updated, not just modified values), then chances that the possible SQL statements that need to be parsed will explode are slim to non-existent.

I would definitely like to see real-world benchmarks on this topic!

Batching: Weakly in favour of PUT

When you want to batch tons of update statements from JDBC, then indeed, you will need to ensure that they all have the exact same SQL string. However, this is not a good argument in favour of using PUT semantics and updating all columns.

I’m saying “not good”, because such a batched update should still only consider a subset of the columns for update, not all the columns. And that subset should be determined on aggregated changed flags, not data modification.

Index updates: In favour of PATCH (depending on the database)

Most databases optimise index updates to ignore indexes whose columns have not been changed. Oracle also doesn’t update indexes whose columns have not been modified, in case of which PUT and PATCH semantics both work the same way from an indexing perspective. Other databases may not work this way, where PATCH semantics is favourable.

But even if the optimisation is in place, the old and the new values have to be compared for equality (i.e. to see if a modification took place). You don’t want to compare millions of strings per second if there’s no need to do so! Check out Morgan Tocker’s interesting answer on Stack Exchange, from a MySQL perspective

So, why not just prevent expensive modification checks by telling the database what has changed, instead?

UNDO overhead: In favour of PATCH

Every statement has a footprint on the UNDO / REDO logs. As I’ve shown above, the statements are semantically different in many ways, so if your statement is bigger (more columns are updated), then the impact on the UNDO / REDO log is bigger as well. This can have drastic effects depending on the size of your table / columns:

Don’t forget that this can also affect backup performance!

More performance related information in this blog post:

https://jonathanlewis.wordpress.com/2007/01/02/superfluous-updates

Note: While these bits of information were mostly Oracle-specific, common sense dictates that other RDBMS will behave in similar ways.

Conclusion

With all these negative aspects to including unnecessary columns for update through an ORM compared to the almost negligible benefits, I’d say that users should move forward and completely avoid this mess. Here’s how:

  • jOOQ optimises this out of the box, if users set the changed values explicitly. Beware that when you “load” a POJO into a Record, it will set all the columns to changed, which may or may not be the desired effect!
  • Hibernate allows for @DynamicUpdate, which may work incorrectly as we have minimal “PATCH” semantics based on modified values, not on changed values. However, JPA allows for declaring more than one entity per table, which might certainly be a valid option for this kind of problem
  • Normalisation is always an option, with its own trade offs. The clicks and purchases columns could be externalised in separate tables, if this benefits the overall design.
  • More often than not, writing an UPDATE with SQL directly is the best choice. As we’ve seen in this article, the counters should be updated with expressions of the form clicks = clicks + 1, which circumvents most problems exposed in this article.

In short, as Michael Simons said:

And we all do feel very dirty when we write SELECT *, right? So we should at least be wary of updating all the columns as well.

How to Avoid the Dreaded Dead Lock when Pessimistic Locking – And some Awesome Java 8 Usage!

Sometimes you simply cannot avoid it: Pessimistic locking via SQL. In fact, it’s an awesome tool when you want to synchronise several applications on a shared, global lock.

Some may think this is abusing the database. We think use the tools you have if they can solve the problem you have. For instance, the RDBMS can be the perfect implementation for a message queue.

Let’s assume you do have that pessimistic locking use-case and you do want to choose the RDBMS. Now, how to get it right? Because it is really easy to produce a deadlock. Imagine the following setup (and I’m using Oracle for this):

CREATE TABLE locks (v NUMBER(18));

INSERT INTO locks
SELECT level
FROM dual
CONNECT BY level <= 10;

This generates 10 records, which we’ll use as 10 distinct row-level locks.

Now, let’s connect to the database from two sqlplus clients:

Instance 1

SQL> SELECT *
  2  FROM locks
  3  WHERE v = 1
  4  FOR UPDATE;

         V
----------
         1

Instance 2

SQL> SELECT *
  2  FROM locks
  3  WHERE v = 2
  4  FOR UPDATE;

         V
----------
         2

We’ve now acquired two different locks from two different sessions.

And then, let’s inverse things:

Instance 1

SQL> SELECT *
  2  FROM locks
  3  WHERE v = 2
  4  FOR UPDATE;

Instance 2

SQL> SELECT *
  2  FROM locks
  3  WHERE v = 1
  4  FOR UPDATE;

Both sessions are now locked and luckily, Oracle will detect this and fail one of the sessions:

ORA-00060: deadlock detected while waiting for resource

Avoiding deadlocks

This is a very explicit example where it is easy to see why it happens, and potentially, how to avoid it. A simple way to avoid deadlocks is to establish a rule that all locks will always have to be acquired in ascending order. If you know you need lock number 1 and 2, you must acquire them in that order. This way, you will still produce locking and thus contention, but at least the contention will eventually (probably) get resolved once load decreases. Here’s an example that shows what happens when you have more clients. This time, written as Java threads.

In the example, we’re using jOOλ for simpler lambda expressions (e.g. lambdas throwing checked exceptions). And of course, we’ll be abusing Java 8, heavily!

Class.forName("oracle.jdbc.OracleDriver");

// We want a collection of 4 threads and their
// associated execution counters
List<Tuple2<Thread, AtomicLong>> list =
IntStream
    .range(0, 4)

    // Let's use jOOλ here to wrap checked exceptions
    // we'll map the thread index to the actual tuple
    .mapToObj(Unchecked.intFunction(i -> {
        final Connection con = DriverManager.getConnection(
            "jdbc:oracle:thin:@localhost:1521:xe", 
            "TEST", "TEST");

        final AtomicLong counter = new AtomicLong();
        final Random rnd = new Random();

        return Tuple.tuple(

            // Each thread acquires a random number of
            // locks in ascending order
            new Thread(Unchecked.runnable(() -> {
                for (;;) {
                    String sql =
                      " SELECT *"
                    + " FROM locks"
                    + " WHERE v BETWEEN ? AND ?"
                    + " ORDER BY v"
                    + " FOR UPDATE";

                    try (PreparedStatement stmt = 
                             con.prepareStatement(sql)) {
                        stmt.setInt(1, rnd.nextInt(10));
                        stmt.setInt(2, rnd.nextInt(10));
                        stmt.executeUpdate();

                        counter.incrementAndGet();
                        con.commit();
                    }
                }
            })),
            counter
        );
    }))
    .collect(Collectors.toList());

// Starting each thread
list.forEach(tuple -> tuple.v1.start());

// Printing execution counts
for (;;) {
    list.forEach(tuple -> {
        System.out.print(String.format(
            "%1s:%2$-10s",
            tuple.v1.getName(),
            tuple.v2.get()
        ));
    });

    System.out.println();
    Thread.sleep(1000);
}

As the program runs, you can see that it continues progressively, with each thread taking approximately the same load as the other threads:

Thread-1:0         Thread-2:0         Thread-3:0         Thread-4:0
Thread-1:941       Thread-2:966       Thread-3:978       Thread-4:979
Thread-1:2215      Thread-2:2206      Thread-3:2244      Thread-4:2253
Thread-1:3422      Thread-2:3400      Thread-3:3466      Thread-4:3418
Thread-1:4756      Thread-2:4720      Thread-3:4855      Thread-4:4847
Thread-1:6095      Thread-2:5987      Thread-3:6250      Thread-4:6173
Thread-1:7537      Thread-2:7377      Thread-3:7644      Thread-4:7503
Thread-1:9122      Thread-2:8884      Thread-3:9176      Thread-4:9155

Now, for the sake of the argument, let’s do the forbidden thing and ORDER BY DBMS_RANDOM.VALUE

String sql =
  " SELECT *"
+ " FROM locks"
+ " WHERE v BETWEEN ? AND ?"
+ " ORDER BY DBMS_RANDOM.VALUE"
+ " FOR UPDATE";

It won’t take long and your application explodes:

Thread-1:0         Thread-2:0         Thread-3:0         Thread-4:0         
Thread-1:72        Thread-2:79        Thread-3:79        Thread-4:90        
Thread-1:72        Thread-2:79        Thread-3:79        Thread-4:90        
Thread-1:72        Thread-2:79        Thread-3:79        Thread-4:90        
Exception in thread "Thread-3" org.jooq.lambda.UncheckedException: 
java.sql.SQLException: ORA-00060: deadlock detected while waiting for resource

Thread-1:72        Thread-2:79        Thread-3:79        Thread-4:93        
Thread-1:72        Thread-2:79        Thread-3:79        Thread-4:93        
Thread-1:72        Thread-2:79        Thread-3:79        Thread-4:93        
Exception in thread "Thread-1" org.jooq.lambda.UncheckedException: 
java.sql.SQLException: ORA-00060: deadlock detected while waiting for resource

Thread-1:72        Thread-2:1268      Thread-3:79        Thread-4:1330      
Thread-1:72        Thread-2:3332      Thread-3:79        Thread-4:3455      
Thread-1:72        Thread-2:5691      Thread-3:79        Thread-4:5841      
Thread-1:72        Thread-2:8663      Thread-3:79        Thread-4:8811      
Thread-1:72        Thread-2:11307     Thread-3:79        Thread-4:11426     
Thread-1:72        Thread-2:12231     Thread-3:79        Thread-4:12348     
Thread-1:72        Thread-2:12231     Thread-3:79        Thread-4:12348     
Thread-1:72        Thread-2:12231     Thread-3:79        Thread-4:12348     
Exception in thread "Thread-4" org.jooq.lambda.UncheckedException: 
java.sql.SQLException: ORA-00060: deadlock detected while waiting for resource

Thread-1:72        Thread-2:13888     Thread-3:79        Thread-4:12348     
Thread-1:72        Thread-2:17037     Thread-3:79        Thread-4:12348     
Thread-1:72        Thread-2:20234     Thread-3:79        Thread-4:12348     
Thread-1:72        Thread-2:23495     Thread-3:79        Thread-4:12348     

And in the end, all but one of your threads have been killed (at least in our example) because of deadlock exceptions.

Beware of execution plans

The above example has worked, because in the given example, the execution plan applied locking AFTER ordering as can be seen here:

SQL_ID  bcyyxqyubp4v8, child number 0
-------------------------------------
SELECT * FROM locks WHERE v BETWEEN :v1 AND :v2 ORDER BY v FOR UPDATE
 
Plan hash value: 2944215640
 
--------------------------------------
| Id  | Operation            | Name  |
--------------------------------------
|   0 | SELECT STATEMENT     |       |
|   1 |  FOR UPDATE          |       |
|   2 |   SORT ORDER BY      |       | <-- happens before FOR UPDATE
|*  3 |    FILTER            |       |
|*  4 |     TABLE ACCESS FULL| LOCKS |
--------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   3 - filter(TO_NUMBER(:V1)<=TO_NUMBER(:V2))
   4 - filter(("V"=TO_NUMBER(:V1)))

(see this article to learn how to get Oracle execution plans like the above)

You should obviously not rely on this in a more real world scenario.

Beware of contention, though

The above examples have also been impressive in terms of displaying the other negative side-effects of pessimistic locking (or locking in general): Contention. The single thread that continued executing in the “bad example” was almost as fast as the four threads before. Our silly example where we used random lock ranges led to the fact that on average, almost every attempt to acquire locks did at least some blocking. How can you figure this out? By looking out for enq: TX – row lock contention events in your sessions. For instance:

SELECT blocking_session, event
FROM v$session
WHERE username = 'TEST'

The above query returns the catastrophic result, here:

BLOCKING_SESSION   EVENT
-------------------------------------
48                 enq: TX - row lock contention
54                 enq: TX - row lock contention
11                 enq: TX - row lock contention
11                 enq: TX - row lock contention

Conclusion

The conclusion can only be: Use pessimistic locking sparingly and always expect the unexpected. When doing pessimistic locking, both deadlocks and heavy contention are quite possible problems that you can run into. As a general rule of thumb, follow these rules (in order):

  • Avoid pessimistic locking if you can
  • Avoid locking more than one row per session if you can
  • Avoid locking rows in random order if you can
  • Avoid going to work to see what happened

FOR UPDATE simulation in SQL Server and CUBRID

Pessimistic locking is useful every now and then when you explicitly want to lock a given set of rows for a subsequent UPDATE, or for database-based inter-process synchronisation and locking. Normally, you would use the SQL Standard FOR UPDATE clause, which is supported in most databases.

A SQL standard example

-- This statement will produce a row-lock on three rows
-- ... or fail / timeout if the row-lock cannot be acquired
SELECT *
  FROM author
 WHERE id IN (3, 4, 5)
   FOR UPDATE

-- This statement will produce cell-locks on two cells in three rows
-- ... or fail / timeout if the cell-lock cannot be acquired
SELECT *
  FROM author
 WHERE id in (3, 4, 5)
   FOR UPDATE OF first_name, last_name

An Oracle syntax extension example

Oracle ships with some useful extensions to the FOR UPDATE clause:

-- This statement will produce a row-lock on three rows
-- skipping all locked records. This is very nice for queue tables where
-- you never want to fail on locks, but just skip to the next record
SELECT *
  FROM author
 WHERE id IN (3, 4, 5)
   FOR UPDATE SKIP LOCKED

Simulation of the FOR UPDATE clause

SQL Server supports FOR UPDATE only in cursors (as defined in the SQL standard). It also ships with proprietary SQL syntax extensions, such as WITH (updlock), that has some very peculiar “features”, such as locking the whole page of a record instead of just one record. The CUBRID database on the other hand doesn’t support pessimistic locking at all in SQL.

But this can be simulated with JDBC, as JDBC allows for specifying the flags TYPE_SCROLL_SENSITIVE, CONCUR_UPDATABLE for any statement, and then using ResultSet.updateXXX() methods to produce a cell-lock / row-lock. Here’s a simplified example in JDBC:

PreparedStatement stmt = connection.prepareStatement(
  "SELECT * FROM author WHERE id IN (3, 4, 5)",
  ResultSet.TYPE_SCROLL_SENSITIVE, 
  ResultSet.CONCUR_UPDATABLE);
ResultSet rs = stmt.executeQuery();

while (rs.next()) {

  // UPDATE the primary key for row-locks, or any other
  // columns for cell-locks
  rs.updateObject(1, rs.getObject(1));
  rs.updateRow();

  // Do more stuff with this record
}

Drawbacks of this approach

The main drawback of this approach is the fact that the database has to maintain a scrollable cursor, whose records are locked one by one. This can cause a major risk of deadlocks or race conditions if the JDBC driver can recover from the unsuccessful locking, if two Java threads execute the following statements:

-- thread 1
SELECT * FROM author ORDER BY id ASC;

-- thread 2
SELECT * FROM author ORDER BY id DESC;

So use this technique with care, possibly only ever locking single rows!

An example using jOOQ to abstract this simulation

The following example shows how jOOQ integration tests test pessimistic locking for all SQL dialects, including SQL Server and CUBRID:

// Two factories with different connections
Factory create1 = // ...
Factory create2 = // ...

// A synchronized list to check for correct execution order
final Vector<String> execOrder = new Vector<String>();

try {
    // This is the test case:
    // 0.0s: Both threads start
    // 0.0s: t1 sleeps for 2s
    // 0.0s: t2 locks the AUTHOR table
    // 0.1s: t2 sleeps for 4s
    // 2.0s: t1 blocks on the AUTHOR table (potentially failing here)
    // 4.0s: t2 commits and unlocks AUTHOR
    // 4.1s: t1 (if blocked) now locks the AUTHOR table

    // For simplicity, no ReentrantLocks and signals are used for
    // synchronisation, but simple Thread.sleep() calls.

    final Thread t1 = new Thread(new Runnable() {
        @Override
        public void run() {
            Thread.sleep(2000);
            execOrder.add("t1-block");
            try {
                create1
                    .select(AUTHOR.ID)
                    .from(AUTHOR)
                    .forUpdate()
                    .fetch();
            }

            // Some databases fail on locking, others lock for a while
            catch (Exception ignore) {
            }
            finally {
                execOrder.add("t1-fail-or-t2-commit");
            }
        }
    });

    final Thread t2 = new Thread(new Runnable() {
        @Override
        public void run() {
            execOrder.add("t2-exec");
            Result<?> result2 = create2
                .selectFrom(AUTHOR)
                .forUpdate()
                .fetch();
            assertEquals(2, result2.size());

            execOrder.add("t2-signal");

            try {
                Thread.sleep(4000);
            }
            catch (Exception ignore) {
            }
 
            execOrder.add("t1-fail-or-t2-commit");

            try {
                create2.getConnection().commit();
            }
            catch (Exception e) {}
        }
    });

    t1.start();
    t2.start();

    t1.join();
    t2.join();

    assertEquals(asList(
      "t2-exec", 
      "t2-signal", 
      "t1-block", 
      "t1-fail-or-t2-commit", 
      "t1-fail-or-t2-commit"), execOrder);
}

The above test shows how the FOR UPDATE clause is implemented or simulated for most databases in jOOQ. Some databases use a timeout on the lock acquisition before failing, while others fail immediately (Oracle allows for specifying that in the FOR UPDATE WAIT / NOWAIT clause)