How to Avoid the Dreaded Dead Lock when Pessimistic Locking – And some Awesome Java 8 Usage!

Sometimes you simply cannot avoid it: Pessimistic locking via SQL. In fact, it’s an awesome tool when you want to synchronise several applications on a shared, global lock. Some may think this is abusing the database. We think use the tools you have if they can solve the problem you have. For instance, the RDBMS can be the perfect implementation for a message queue. Let’s assume you do have that pessimistic locking use-case and you do want to choose the RDBMS. Now, how to get it right? Because it is really easy to produce a deadlock. Imagine the following setup (and I’m using Oracle for this):

CREATE TABLE locks (v NUMBER(18));

INSERT INTO locks
SELECT level
FROM dual
CONNECT BY level <= 10;

This generates 10 records, which we’ll use as 10 distinct row-level locks. Now, let’s connect to the database from two sqlplus clients: Instance 1

SQL> SELECT *
  2  FROM locks
  3  WHERE v = 1
  4  FOR UPDATE;

         V
----------
         1

Instance 2

SQL> SELECT *
  2  FROM locks
  3  WHERE v = 2
  4  FOR UPDATE;

         V
----------
         2

We’ve now acquired two different locks from two different sessions. And then, let’s inverse things: Instance 1

SQL> SELECT *
  2  FROM locks
  3  WHERE v = 2
  4  FOR UPDATE;

Instance 2

SQL> SELECT *
  2  FROM locks
  3  WHERE v = 1
  4  FOR UPDATE;

Both sessions are now locked and luckily, Oracle will detect this and fail one of the sessions:
ORA-00060: deadlock detected while waiting for resource

Avoiding deadlocks

This is a very explicit example where it is easy to see why it happens, and potentially, how to avoid it. A simple way to avoid deadlocks is to establish a rule that all locks will always have to be acquired in ascending order. If you know you need lock number 1 and 2, you must acquire them in that order. This way, you will still produce locking and thus contention, but at least the contention will eventually (probably) get resolved once load decreases. Here’s an example that shows what happens when you have more clients. This time, written as Java threads. In the example, we’re using jOOλ for simpler lambda expressions (e.g. lambdas throwing checked exceptions). And of course, we’ll be abusing Java 8, heavily!

Class.forName("oracle.jdbc.OracleDriver");

// We want a collection of 4 threads and their
// associated execution counters
List<Tuple2<Thread, AtomicLong>> list =
IntStream
    .range(0, 4)

    // Let's use jOOλ here to wrap checked exceptions
    // we'll map the thread index to the actual tuple
    .mapToObj(Unchecked.intFunction(i -> {
        final Connection con = DriverManager.getConnection(
            "jdbc:oracle:thin:@localhost:1521:xe", 
            "TEST", "TEST");

        final AtomicLong counter = new AtomicLong();
        final Random rnd = new Random();

        return Tuple.tuple(

            // Each thread acquires a random number of
            // locks in ascending order
            new Thread(Unchecked.runnable(() -> {
                for (;;) {
                    String sql =
                      " SELECT *"
                    + " FROM locks"
                    + " WHERE v BETWEEN ? AND ?"
                    + " ORDER BY v"
                    + " FOR UPDATE";

                    try (PreparedStatement stmt = 
                             con.prepareStatement(sql)) {
                        stmt.setInt(1, rnd.nextInt(10));
                        stmt.setInt(2, rnd.nextInt(10));
                        stmt.executeUpdate();

                        counter.incrementAndGet();
                        con.commit();
                    }
                }
            })),
            counter
        );
    }))
    .collect(Collectors.toList());

// Starting each thread
list.forEach(tuple -> tuple.v1.start());

// Printing execution counts
for (;;) {
    list.forEach(tuple -> {
        System.out.print(String.format(
            "%1s:%2$-10s",
            tuple.v1.getName(),
            tuple.v2.get()
        ));
    });

    System.out.println();
    Thread.sleep(1000);
}

As the program runs, you can see that it continues progressively, with each thread taking approximately the same load as the other threads:
Thread-1:0         Thread-2:0         Thread-3:0         Thread-4:0
Thread-1:941       Thread-2:966       Thread-3:978       Thread-4:979
Thread-1:2215      Thread-2:2206      Thread-3:2244      Thread-4:2253
Thread-1:3422      Thread-2:3400      Thread-3:3466      Thread-4:3418
Thread-1:4756      Thread-2:4720      Thread-3:4855      Thread-4:4847
Thread-1:6095      Thread-2:5987      Thread-3:6250      Thread-4:6173
Thread-1:7537      Thread-2:7377      Thread-3:7644      Thread-4:7503
Thread-1:9122      Thread-2:8884      Thread-3:9176      Thread-4:9155
Now, for the sake of the argument, let’s do the forbidden thing and ORDER BY DBMS_RANDOM.VALUE

String sql =
  " SELECT *"
+ " FROM locks"
+ " WHERE v BETWEEN ? AND ?"
+ " ORDER BY DBMS_RANDOM.VALUE"
+ " FOR UPDATE";

It won’t take long and your application explodes:
Thread-1:0         Thread-2:0         Thread-3:0         Thread-4:0         
Thread-1:72        Thread-2:79        Thread-3:79        Thread-4:90        
Thread-1:72        Thread-2:79        Thread-3:79        Thread-4:90        
Thread-1:72        Thread-2:79        Thread-3:79        Thread-4:90        
Exception in thread "Thread-3" org.jooq.lambda.UncheckedException: 
java.sql.SQLException: ORA-00060: deadlock detected while waiting for resource

Thread-1:72        Thread-2:79        Thread-3:79        Thread-4:93        
Thread-1:72        Thread-2:79        Thread-3:79        Thread-4:93        
Thread-1:72        Thread-2:79        Thread-3:79        Thread-4:93        
Exception in thread "Thread-1" org.jooq.lambda.UncheckedException: 
java.sql.SQLException: ORA-00060: deadlock detected while waiting for resource

Thread-1:72        Thread-2:1268      Thread-3:79        Thread-4:1330      
Thread-1:72        Thread-2:3332      Thread-3:79        Thread-4:3455      
Thread-1:72        Thread-2:5691      Thread-3:79        Thread-4:5841      
Thread-1:72        Thread-2:8663      Thread-3:79        Thread-4:8811      
Thread-1:72        Thread-2:11307     Thread-3:79        Thread-4:11426     
Thread-1:72        Thread-2:12231     Thread-3:79        Thread-4:12348     
Thread-1:72        Thread-2:12231     Thread-3:79        Thread-4:12348     
Thread-1:72        Thread-2:12231     Thread-3:79        Thread-4:12348     
Exception in thread "Thread-4" org.jooq.lambda.UncheckedException: 
java.sql.SQLException: ORA-00060: deadlock detected while waiting for resource

Thread-1:72        Thread-2:13888     Thread-3:79        Thread-4:12348     
Thread-1:72        Thread-2:17037     Thread-3:79        Thread-4:12348     
Thread-1:72        Thread-2:20234     Thread-3:79        Thread-4:12348     
Thread-1:72        Thread-2:23495     Thread-3:79        Thread-4:12348     
And in the end, all but one of your threads have been killed (at least in our example) because of deadlock exceptions.

Beware of execution plans

The above example has worked, because in the given example, the execution plan applied locking AFTER ordering as can be seen here:
SQL_ID  bcyyxqyubp4v8, child number 0
-------------------------------------
SELECT * FROM locks WHERE v BETWEEN :v1 AND :v2 ORDER BY v FOR UPDATE
 
Plan hash value: 2944215640
 
--------------------------------------
| Id  | Operation            | Name  |
--------------------------------------
|   0 | SELECT STATEMENT     |       |
|   1 |  FOR UPDATE          |       |
|   2 |   SORT ORDER BY      |       | <-- happens before FOR UPDATE
|*  3 |    FILTER            |       |
|*  4 |     TABLE ACCESS FULL| LOCKS |
--------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   3 - filter(TO_NUMBER(:V1)<=TO_NUMBER(:V2))
   4 - filter(("V"=TO_NUMBER(:V1)))
(see this article to learn how to get Oracle execution plans like the above) You should obviously not rely on this in a more real world scenario.

Beware of contention, though

The above examples have also been impressive in terms of displaying the other negative side-effects of pessimistic locking (or locking in general): Contention. The single thread that continued executing in the “bad example” was almost as fast as the four threads before. Our silly example where we used random lock ranges led to the fact that on average, almost every attempt to acquire locks did at least some blocking. How can you figure this out? By looking out for enq: TX – row lock contention events in your sessions. For instance:

SELECT blocking_session, event
FROM v$session
WHERE username = 'TEST'

The above query returns the catastrophic result, here:
BLOCKING_SESSION   EVENT
-------------------------------------
48                 enq: TX - row lock contention
54                 enq: TX - row lock contention
11                 enq: TX - row lock contention
11                 enq: TX - row lock contention

Conclusion

The conclusion can only be: Use pessimistic locking sparingly and always expect the unexpected. When doing pessimistic locking, both deadlocks and heavy contention are quite possible problems that you can run into. As a general rule of thumb, follow these rules (in order):
  • Avoid pessimistic locking if you can
  • Avoid locking more than one row per session if you can
  • Avoid locking rows in random order if you can
  • Avoid going to work to see what happened

4 thoughts on “How to Avoid the Dreaded Dead Lock when Pessimistic Locking – And some Awesome Java 8 Usage!

  1. I feel that your rules of thumb lack a certain pragmatism:

    * “Avoid pessimistic locking if you can” …you usually can’t.
    * “Avoid locking more than one row per session if you can” …you usually can’t.
    * “Avoid locking rows in random order if you can” …you usually can’t.
    * “Avoid going to work to see what happened” …you usually can’t.

    In any non-toy application, the above are extremely hard to achieve. I’ve come closest to satisfying these rules using the following approach:

    * Partition your data into independent subsets.
    * Use a single designated writer per data partition.

    But of course, the above only works reliably if you

    * Avoid locking across multiple data partitions … you usually can’t :(

    1. That’s a rather pessimistic view on pessimistic locking ;-)

      I’ve seen systems where this worked pretty smoothly. Well… they were read-heavy systems with easily partitioned data sets where (mostly) only one writer could write to the same partition. I was spoiled.

        1. I agree that things get very hairy when joins and locking are mixed as in your linked Stack Overflow question. The article here tried to avoid that topic – I have actually never seen a SELECT .. FOR UPDATE with joins in the wild – outside of jOOQ integration tests.

          You’re right that the experienced ordering of locking is not something that should be relied upon. While it is guaranteed to work in Oracle’s FOR UPDATE SKIP LOCKED clause – it might not be wise to rely on it in a FOR UPDATE [ WAIT ] clause as in the article. I’ll add a comment.

Leave a Reply