Fun with PostGIS: Mandelbrot Set, Game of Life, and More

The upcoming jOOQ 3.16 will finally offer support for the various RDBMS GIS extensions via issue #982. This is great news per se, and will be covered in a future blog post, when the integration is ready. This post here is about something else.

Adding support for such a feature is a great source of procrastination. You see, when you develop a backend library, all you ever do is implement algorithms, API, and other abstract stuff. But GIS is different. With GIS, a SQL developer like me suddenly has access to a “UI”, and having access to a “UI” woke the inner programmer child in me.

I was pleasantly surprised that DBeaver, my preferred SQL editor, has out of the box support for GIS’s WKT format. E.g. run a query like this:

SELECT st_geomfromtext('polygon((0 0, 2 3, 0 6, -2 3, 0 0))');

The output being

Let’s play around with GIS

So, naturally, what other thing to do than play around with it? The most obvious next thing to generate would be your favourite logo, which happens to be very easy to map to polygons. Let’s look at a few GIS features step by step. And for this blog post, I’ll be using PostGIS, which you can get very easily from docker hub:

WITH
  -- These "sprites" are just reusable 2D polygons, which I declared
  -- to avoid having to deal with the lengthy WKT format
  sprites AS (
    SELECT 

      -- We project the original 1x1 square, and a scaled 4x4 version
      s AS square,
      st_scale(s, 4, 4) AS square4

    -- Here, we're creating a square called "s"
    FROM (VALUES 
      (st_polygonfromtext('polygon ((0 0, 1 0, 1 1, 0 1, 0 0))'))
    ) t (s)
  )

-- st_union combines 2 polygons into a multipolygon
SELECT st_union(square, st_translate(square4, 2, 0))
FROM sprites

The output of the above query is

MULTIPOLYGON (((0 0, 1 0, 1 1, 0 1, 0 0)), ((2 0, 6 0, 6 4, 2 4, 2 0)))

Or, using DBeaver’s visualisation utility:

The st_union isn’t much different from any other set union. Note that I’ve translated the larger square 2 units to the right, so they don’t overlap. Otherwise, the union would have just been the bigger square.

A polygon describes a set of points in the 2D number plane. Creating a union of those two sets of points is natural. We can also create a difference of the two squares, instead:

WITH
  sprites AS (
    SELECT 
      s AS square,
      st_scale(s, 4, 4) AS square4
    FROM (VALUES 
      (st_polygonfromtext('polygon ((0 0, 1 0, 1 1, 0 1, 0 0))'))
    ) t (s)
  )
SELECT st_difference(square4, square)
FROM sprites

Resulting in:

POLYGON ((0 4, 4 4, 4 0, 1 0, 1 1, 0 1, 0 4))

Or, visually:

With these simple tools, we can now create all 4 letters of the jOOQ logo. As a reminder, the tools were:

  • st_polygonfromtext: Create a polygon from a WKT text representation
  • st_scale: Scale any geometry to a new size
  • st_translate: Translate any geometry to a new position
  • st_union: Combine two geometries (or more, if used as an aggregate function)
  • st_difference: Remove one geometry from another

The GIS API is vast, both in PostGIS as well as in the ISO/IEC 13249-3:2016 standard version, but these simple tools shall suffice for now. Let’s look at this query:

WITH
  -- Creating the two squares to work with
  sprites AS (
    SELECT 
      s AS square,
      st_scale(s, 4, 4) AS square4
    FROM (VALUES 
      (st_polygonfromtext('polygon ((0 0, 1 0, 1 1, 0 1, 0 0))'))
    ) t (s)
  ),
  
  -- All the 4 letters j, o1, o2, q of the jOOQ logo
  letters AS (
    SELECT
    
      -- The letter j
      st_difference(
        st_difference(
          st_difference(
            square4, 
            st_translate(st_scale(square, 2, 3), 1, 1)
          ),
          st_translate(st_scale(square, 1, 2), 0, 2)
        ),
        st_translate(st_scale(square, 1, 0.5), 3, 2.5)
      ) AS j,
      
      -- The letter o
      st_difference(square4, st_translate(square, 1, 1)) AS o1,
      
      -- The letter o
      st_difference(square4, st_translate(square, 2, 2)) AS o2,
      
      -- The letter q
      st_union(
        st_difference(
          square4, 
          st_translate(st_scale(square, 2, 2), 1, 1)
        ),
        st_translate(st_scale(square, 1, 2.5), 1.5, -1)
      ) as q
    FROM sprites
  )
SELECT

  -- Combine all the 4 letters
  st_union(v)
FROM
  letters,
  
  -- Arrange the 4 letters next to each other
  LATERAL (VALUES
    (st_translate(j, 0, 5)),
    (st_translate(o1, 5, 5)),
    (o2),
    (st_translate(q, 5, 0))
  ) t (v);

This produces:

Neat, huh?

Next step: The Mandelbrot Set

A natural next step for any procrastinator is to generate the Mandelbrot set. To prepare ourselves for what’s behind the Mandelbrot, have a look at this neat video (more procrastination):

There are different ways to calculate it with SQL, here’s one that I came up with:

WITH RECURSIVE

  -- These are the dimensions that you can play around with
  dims (r1, r2, i1, i2, s, it, p) AS (
    VALUES (
      
      -- The dimensions of the real axis
      -2::float, 1::float, 
      
      -- The dimensions of the imaginary axis
      -1.5::float, 1.5::float, 
      
      -- The step size on each axis, per pixel or sprite
      0.01::float, 
      
      -- The maximum number of iterations
      100, 
      
      -- "Infinity", i.e. when to stop
      256.0::float
    )
  ),
  
  -- The square again, as before
  sprites (s) AS (VALUES 
    (st_polygonfromtext('polygon ((0 0, 0 1, 1 1, 1 0, 0 0))'))
  ),
  
  -- The number plane, as ints
  n1 (r, i) AS (
    SELECT r, i 
    FROM 
      dims, 
      generate_series((r1 / s)::int, (r2 / s)::int) r,
      generate_series((i1 / s)::int, (i2 / s)::int) i
  ),
  
  -- The number plane as scaled floats
  n2 (r, i) AS (
    SELECT r::float * s::float, i::float * s::float
    FROM dims, n1
  ),
  
  -- The recursive calculation of the Mandelbrot formula
  -- zn = (zn-1)^2 + c
  l (cr, ci, zr, zi, g, it, p) AS (
    SELECT r, i, 0::float, 0::float, 0, it, p FROM n2, dims
    UNION ALL
    SELECT cr, ci, zr*zr - zi*zi + cr, 2*zr*zi + ci, g + 1, it, p 
    FROM l
    
    -- The recursions stops when we reach the maximum
    WHERE g < it
    
    -- Or, when we reach "infinity"
    AND zr*zr + zi*zi < p
  ),
  
  -- Find the last calculated value per point in the
  -- complex number plane c (cr, ci), discard the others
  x (cr, ci, zr, zi, g) AS (
    SELECT DISTINCT ON (cr, ci) cr, ci, zr, zi, g
    FROM l
    ORDER BY cr, ci, g DESC
  )
  
-- Turn every calculated point into a square
SELECT 
  st_union(
    st_translate(sprites.s, round(cr / dims.s), round(ci / dims.s))
  )
FROM x, sprites, dims

-- Retain only the points *not* belonging to the Mandelbrot set
-- You can also inverse the equation to retain the points that
-- belong to the set
WHERE zr*zr + zi*zi > p;

Note, I’m using an artificial “infinity” here, because:

  1. That speeds things up with little noticeable difference at this zoom scale
  2. I couldn’t figure out how to make PostgreSQL overflow float operations to float infinities, like Java or CockroachDB or other IEEE 754 float implementations do. Any help appreciated, see this Stack Overflow question.

The output is the well known shape

You can play around with this and use different values for “dims” to zoom in, e.g.

dims (r1, r2, i1, i2, s, it, p) as (values (
  (-0.925-0.032)::float, (-0.925+0.032)::float, 
  (0.266-0.032)::float, (0.266+0.032)::float, 
  0.00005::float, 100, 256.0::float)
)

This will generate some really neat zooms, all with SQL:

Granted, it’s not the most efficient way to calculate these things, but that wasn’t the point here, was it?

Game of Life

I can be nerd-sniped:

And of course, I had to accept that challenge too! Now, the Game of Life is a simple “game” where we have the x,y natural number plane (e.g. “pixels”), and with each coordinate, we associate whether that “cell” is dead or alive. Then, we establish the following set of rules:

  1. Any live cell with two or three live neighbours survives.
  2. Any dead cell with three live neighbours becomes a live cell.
  3. All other live cells die in the next generation. Similarly, all other dead cells stay dead.

No, it’s hard to “animate” things in SQL using spatial extensions, so as a workaround, I’ll just display iterations of 100×100 pixel tiles of the Game of Life next to each other. The first iteration is just the starting point, e.g. a random number of “alive” cells:

WITH RECURSIVE

  -- Again generate a "sprite"a from a square polygon
  sprites (s) AS (VALUES (
    st_polygonfromtext('polygon ((0 0, 0 1, 1 1, 1 0, 0 0))')
  )),

  -- Generate the x, y number plane and associate a boolean value
  -- with each coordinate, generated randomly.
  -- 10% of all cells are alive
  m (x, y, b) AS (
    SELECT x, y, 
      CASE WHEN random() > 0.9 THEN 1 ELSE 0 END 
    FROM 
      generate_series(1, 100) x, 
      generate_series(1, 100) y
  )
SELECT st_union(st_translate(sprites.s, x::float, y::float))
FROM m, sprites

-- Display only "alive" cells
WHERE b = 1

This will produce something like:

Now, all we have to do is iterate the game formula. Now, recursive SQL has quite a few limitations. E.g. I couldn’t get PostgreSQL to aggregate recursive data, nor self-join the recursive table to find the nearest neighbors of any cell. But with window functions, this is definitely possible. So here goes:

WITH RECURSIVE
  sprites (s) AS (VALUES (
    st_polygonfromtext('polygon ((0 0, 0 1, 1 1, 1 0, 0 0))')
  )),
  m (x, y, b) AS (
    SELECT x, y, 
      CASE WHEN random() > 0.9 THEN 1 ELSE 0 END
    FROM 
      generate_series(1, 100) x, 
      generate_series(1, 100) y
  ),
  
  -- This is the recursion of the Game of Life
  e (x, y, b, g) AS (
  
    -- The recursion starts with the above initial data
    SELECT x, y, b, 1 FROM m
    UNION ALL
    SELECT
      e.x, e.y,
      CASE
      
        -- If this cell is alive, and 2 or 3
        -- neighbors are alive, too, then this
        -- cell stays alive
        WHEN e.b = 1 AND
          sum(e.b) OVER w1
        + sum(e.b) OVER w2
        + sum(e.b) OVER w3 IN (2, 3)
          THEN 1
          
        -- If this cell is dead, and 3 neighbors
        -- are alive, then this cell becomes alive
        WHEN e.b = 0 AND
          sum(e.b) OVER w1
        + sum(e.b) OVER w2
        + sum(e.b) OVER w3 = 3
          THEN 1
          
        -- Otherwise, this cell dies or stays dead
        ELSE 0
      END,
      e.g + 1
    FROM e
    
    -- The recursion stops after 100 iterations
    WHERE e.g < 100
    WINDOW
    
      -- We order the data set by x, y. In SQL, there isn't 
      -- a notion of a 2-dimensional number plane. We only
      -- have 1 dimension.
      o AS (ORDER BY x, y),
      
      -- w1 is all the neighbors of the previous row in the x, y
      -- plane, which is all the SQL rows that are 101-99 rows
      -- before the current row.
      w1 AS (o ROWS BETWEEN 101 PRECEDING AND  99 PRECEDING),
      
      -- w2 is all the neighbors of the current row in the x, y
      -- number plane, excluding the current cell
      w2 AS (o ROWS BETWEEN   1 PRECEDING AND   1 FOLLOWING 
               EXCLUDE CURRENT ROW),
               
      -- w3 is all the neighbors of the next row
      w3 AS (o ROWS BETWEEN  99 FOLLOWING AND 101 FOLLOWING)
  )

-- Finally, combine all the iterations  
SELECT st_union(st_translate(
  sprites.s, 
  (x + (g - 1) % 10 * 120)::float, 
  (y - (g - 1) / 10 * 120)::float
))
FROM e, sprites
WHERE b = 1
;

Think of the window specifications as follows:

----+-------------+-------------+-------------+-------------+----
... | 105: (1, 5) | 106: (1, 6) | 107: (1, 7) | 108: (1, 8) | ...
... | 205: (2, 5) | 206: (2, 6) | 207: (2, 7) | 208: (2, 8) | ...
... | 305: (3, 5) | 306: (3, 6) | 307: (3, 7) | 308: (3, 8) | ...
... | 405: (4, 5) | 406: (4, 6) | 407: (4, 7) | 408: (4, 8) | ...

So, in a 100×100 grid, x=3,y=7 can be encoded as 307, and its neighbors are

  • w1: 206, 207, 208, i.e. between 101 preceding and 99 preceding of 307
  • w2: 306, 308, i.e. between 1 preceding and 1 following of 307
  • w3: 406, 407, 408, i.e. between 99 following and 101 following of 307

The output looks like this:

Or, zooming in on iterations 1-4:

Or 21-24:

That’s really cool, huh!

Glider Gun

The nerd-snipe was to animate the glider gun pattern, which just means we’ll have to replace the random generation of the first iteration by something constant.

WITH RECURSIVE
  sprites (s) AS (VALUES (
    st_polygonfromtext('polygon ((0 0, 0 1, 1 1, 1 0, 0 0))')
  )),
  m (x, y, b) AS (
    SELECT x, y, 

      -- Initial data here
      CASE WHEN (x, y) IN (
        (2, 6), (2, 7), (3, 6), (3, 7), (12, 6), (12, 7), (12, 8), 
        (13, 5), (13, 9), (14, 4), (14, 10), (15, 4), (15, 10),
        (16, 7), (17, 5), (17, 9), (18, 6), (18, 7), (18, 8), 
        (19, 7), (22, 4), (22, 5), (22, 6), (23, 4), (23, 5), 
        (23, 6), (24, 3), (24, 7), (26, 2), (26, 3), (26, 7), 
        (26, 8), (36, 4), (36, 5), (37, 4), (37, 5)
      ) THEN 1 ELSE 0 END 
    FROM 
      generate_series(1, 100) x, 
      generate_series(1, 100) y
  ),
  e (x, y, b, g) AS (
    SELECT x, y, b, 1 FROM m
    UNION ALL
    SELECT
      e.x, e.y,
      CASE
        WHEN e.b = 1 AND
          sum(e.b) OVER w1
        + sum(e.b) OVER w2
        + sum(e.b) OVER w3 IN (2, 3)
          THEN 1
        WHEN e.b = 0 AND
          sum(e.b) OVER w1
        + sum(e.b) OVER w2
        + sum(e.b) OVER w3 = 3
          THEN 1
        ELSE 0
      END,
      e.g + 1
    FROM e
    WHERE e.g < 100
    WINDOW
      o AS (ORDER BY x, y),
      w1 AS (o ROWS BETWEEN 101 PRECEDING AND  99 PRECEDING),
      w2 AS (o ROWS BETWEEN   1 PRECEDING AND   1 FOLLOWING 
               EXCLUDE CURRENT ROW),
      w3 AS (o ROWS BETWEEN  99 FOLLOWING AND 101 FOLLOWING)
  )
SELECT st_union(st_translate(
  sprites.s, 
  (x + (g - 1) % 10 * 120)::float, 
  (y - (g - 1) / 10 * 120)::float
))
FROM e, sprites
WHERE b = 1
;

And, as you can see, the gun works:

It started with this:

And eventually produces the well known gliders:

PostgreSQL 14’s enable_memoize For Improved Performance of Nested Loop Joins

I’ve recently discovered a pleasant new addition to PostgreSQL 14, the new enable_memoize flag that improves the performance of some nested loop joins where statistics hint at this being appropriate. I mean, who can resist this temptation:

Improving query speed by 1000x hints at something very suboptimal having been going on before, and a tool like memoization can be of great help. But will it also help with an “ordinary” join? I wanted to try myself.

What’s memoization?

In a perfect world free of side effects (and SQL is such a perfect world, in theory), memoization means that we can substitute y for f(x) in any computation, given that y = f(x). For example, no matter how many times you calculate UPPER('x'), you’ll always get 'X'. If the calculation of such a function is costly, and there are only few possible input values, then why not just maintain a hash map that maps all previous input values and use that to look up known (or at least frequent) values instead of computing them again?

As I’ve shown previously on this blog, Oracle 11g has introduced a feature called scalar subquery caching, a feature which you can activate in jOOQ to avoid costly PL/SQL context switches.

In the case of PostgreSQL’s enable_memoize, this can be particularly useful for nested loop joins in SQL, and to reference the above tweet, a lateral join is often executed via a nested loop join.

Turning the feature on and off

I’ve created a schema like this:

CREATE TABLE t AS
SELECT i, i % 5 AS j 
FROM generate_series(1, 100000) AS t(i);

CREATE TABLE u AS
SELECT i, i % 20000 as j 
FROM generate_series(1, 100000) AS t(i);

CREATE INDEX uj ON u(j);

In summary:

  • Both tables t and u have 100000 rows.
  • t.j has only 5 distinct values, each value appears 20000 times.
  • u.j has 20000 distinct values, each value appears 5 times.

When running this on PostgreSQL 14:

SELECTcurrent_setting('enable_memoize');

I get:

|current_setting|
|---------------|
|on             |

So, the feature is active, which I can also see in an EXPLAIN of the following query:

EXPLAIN
SELECT *
FROM t JOIN u ON t.j = u.j;

The plan is:

|QUERY PLAN                                                            |
|----------------------------------------------------------------------|
|Nested Loop  (cost=0.30..8945.41 rows=496032 width=16)                |
|  ->  Seq Scan on t  (cost=0.00..1443.00 rows=100000 width=8)         |
|  ->  Memoize  (cost=0.30..0.41 rows=5 width=8)                       | 
|        Cache Key: t.j                                                |
|        ->  Index Scan using uj on u  (cost=0.29..0.40 rows=5 width=8)|
|              Index Cond: (j = t.j)                                   |

Without memoization, when joining the two tables like that, then, for the 100000 rows in t, I have to look up 100000x the 5 matching rows in u. But if memoization kicks in, then I will have to perform the lookup only 5 times, because there are only 5 distinct values of t.j

We can play around with execution plans by turning the feature on or off:

SET enable_memoize = ON;
SET enable_memoize = OFF;

When turned off, PostgreSQL seems to choose a hash join or merge join instead, on my machine (between multiple executions, the plan might switch)

|QUERY PLAN                                                         |
|-------------------------------------------------------------------|
|Hash Join  (cost=3084.00..11568.51 rows=499351 width=16)           |
|  Hash Cond: (t.j = u.j)                                           |
|  ->  Seq Scan on t  (cost=0.00..1443.00 rows=100000 width=8)      |
|  ->  Hash  (cost=1443.00..1443.00 rows=100000 width=8)            |
|        ->  Seq Scan on u  (cost=0.00..1443.00 rows=100000 width=8)|


|QUERY PLAN                                                              |
|------------------------------------------------------------------------|
|Merge Join  (cost=9748.11..763846.11 rows=50000000 width=16)            |
|  Merge Cond: (u.j = t.j)                                               |
|  ->  Index Scan using uj on u  (cost=0.29..3848.29 rows=100000 width=8)|
|  ->  Sort  (cost=9747.82..9997.82 rows=100000 width=8)                 |
|        Sort Key: t.j                                                   |
|        ->  Seq Scan on t  (cost=0.00..1443.00 rows=100000 width=8)     |

Let’s Benchmark

We’re using the usual benchmark technique described here:

  • We repeat an operation 25x in mode A and mode B and compare (or more than 25, if it’s a fast operation)
  • We repeat the above 5x to mitigate any warmup and other caching effects

You can run the following benchmark on the above schema yourself, to verify:

DO $$
DECLARE
  v_ts TIMESTAMP;
  v_repeat CONSTANT INT := 25;
  rec RECORD;
BEGIN

  -- Repeat the whole benchmark several times to avoid warmup penalty
  FOR r IN 1..5 LOOP
    v_ts := clock_timestamp();
    SET enable_memoize = OFF;
  
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT t.*
        FROM t JOIN u ON t.j = u.j
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;

    RAISE INFO 'Run %, Statement 1: %', r, (clock_timestamp() - v_ts);
    v_ts := clock_timestamp();
    SET enable_memoize = ON;

    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT t.*
        FROM t JOIN u ON t.j = u.j
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;

    RAISE INFO 'Run %, Statement 2: %', r, (clock_timestamp() - v_ts);
    RAISE INFO '';
  END LOOP;
END$$;

On my machine, the results are consistent, decent, not too impressive, but still significant:

Run 1, Statement 1: 00:00:03.763426
Run 1, Statement 2: 00:00:03.401346

Run 2, Statement 1: 00:00:03.769419
Run 2, Statement 2: 00:00:03.375677

Run 3, Statement 1: 00:00:03.771465
Run 3, Statement 2: 00:00:03.374413

Run 4, Statement 1: 00:00:03.769136
Run 4, Statement 2: 00:00:03.398734

Run 5, Statement 1: 00:00:03.772544
Run 5, Statement 2: 00:00:03.375272

I.e. a 10% speedup. Across the whole system, that alone would already be worth it.

Optimising LATERAL

Let’s try optimising LATERAL instead. We could run a query like this:

SELECT *
FROM 
  t, 
  LATERAL (
    SELECT count(*) 
    FROM u 
    WHERE t.j = u.j
  ) AS u(j)

The EXPLAIN of the above is

|QUERY PLAN                                                                       |
|---------------------------------------------------------------------------------|
|Nested Loop  (cost=4.40..3969.47 rows=100000 width=16)                           |
|  ->  Seq Scan on t  (cost=0.00..1443.00 rows=100000 width=8)                    |
|  ->  Memoize  (cost=4.40..4.42 rows=1 width=8)                                  |
|        Cache Key: t.j                                                           |
|        ->  Aggregate  (cost=4.39..4.40 rows=1 width=8)                          |
|              ->  Index Only Scan using uj on u  (cost=0.29..4.38 rows=5 width=0)|
|                    Index Cond: (j = t.j)                                        |

So, we can again cache the computation of the COUNT(*) value for each of the 5 distinct t.j input values, rather than re-calculating this every time. Surely, this must be even better than before?

Benchmark time!

DO $$
DECLARE
  v_ts TIMESTAMP;
  v_repeat CONSTANT INT := 25;
  rec RECORD;
BEGIN

  -- Repeat the whole benchmark several times to avoid warmup penalty
  FOR r IN 1..5 LOOP
    v_ts := clock_timestamp();
    SET enable_memoize = OFF;
  
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT *
        FROM 
          t, 
          LATERAL (
            SELECT count(*) 
            FROM u 
            WHERE t.j = u.j
          ) AS u(j)
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;

    RAISE INFO 'Run %, Statement 1: %', r, (clock_timestamp() - v_ts);
    v_ts := clock_timestamp();
    SET enable_memoize = ON;

    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT *
        FROM 
          t, 
          LATERAL (
            SELECT count(*) 
            FROM u 
            WHERE t.j = u.j
          ) AS u(j)
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;

    RAISE INFO 'Run %, Statement 2: %', r, (clock_timestamp() - v_ts);
    RAISE INFO '';
  END LOOP;
END$$;

And this time, we can see a significant speedup!

Run 1, Statement 1: 00:00:03.419728
Run 1, Statement 2: 00:00:01.083941

Run 2, Statement 1: 00:00:03.404954
Run 2, Statement 2: 00:00:01.098404

Run 3, Statement 1: 00:00:03.425725
Run 3, Statement 2: 00:00:01.093883

Run 4, Statement 1: 00:00:03.441691
Run 4, Statement 2: 00:00:01.127837

Run 5, Statement 1: 00:00:03.420172
Run 5, Statement 2: 00:00:01.097943

That’s great news! Wait, does this work also for ordinary correlated subqueries? Because the above LATERAL correlated subquery could be rewritten as:

SELECT 
  t.*, 
  (
    SELECT count(*)
    FROM u
    WHERE t.j = u.j
  ) j
FROM t;

Regrettably, the plan doesn’t show memoization:

|QUERY PLAN                                                                   |
|-----------------------------------------------------------------------------|
|Seq Scan on t  (cost=0.00..441693.00 rows=100000 width=16)                   |
|  SubPlan 1                                                                  |
|    ->  Aggregate  (cost=4.39..4.40 rows=1 width=8)                          |
|          ->  Index Only Scan using uj on u  (cost=0.29..4.38 rows=5 width=0)|
|                Index Cond: (j = t.j)                                        |

And the benchmark (you can paste the query into the benchmark logic yourself to reproduce) confirms there’s no memoization effect

Run 1, Statement 1: 00:00:03.617562
Run 1, Statement 2: 00:00:03.605765

Run 2, Statement 1: 00:00:03.610084
Run 2, Statement 2: 00:00:03.682064

Run 3, Statement 1: 00:00:03.725952
Run 3, Statement 2: 00:00:03.705622

Run 4, Statement 1: 00:00:03.672669
Run 4, Statement 2: 00:00:03.644612

Run 5, Statement 1: 00:00:03.645741
Run 5, Statement 2: 00:00:03.642717

It seems that with this new feature, correlated subqueries could be rewritten to nested loop outer joins in the future? Other optimisers already do this, and we would have effectively the same feature here as Oracle’s scalar subquery caching.

Conclusion

The feature is turned on in PostgreSQL 14. Apart from some additional memory consumption, which might be a small problem if the optimiser is wrong and statistics are off, I don’t see any drawback of this new feature. SQL is a side-effect free (in theory) 4GL, meaning the optimiser can replace any computation by a cached value that depends only on the computation’s input values.

A correlated subquery is a function, whose input parameters are the predicates and other references to the outer query’s columns. As such, the result of a correlated subquery can be cached, or memoized. As shown above, this has drastic effects on your present day SQL queries, from whcih you can profit just by upgrading to PostgreSQL 14.

Java’s Checked Exceptions Are Just Weird Union Types

This fun fact has been on my mind for a while, and a recent reddit thread about “Smuggling Checked Exceptions with Sealed Interfaces” made me write this post here. Namely, Java had union types before it was cool! (If you squint hard).

What are union types?

Ceylon is an underrated JVM language that never really took off, which is too bad, because the concepts it introduced are very elegant (see e.g. how they implemented nullable types as syntax sugar on top of union types, which IMO is much better than anything monadic using Option types or kotlin’s ad-hoc type system extension).

So, one of those concepts are union types. One of the most popular language that supports them, currently, is TypeScript, though C++, PHP, and Python also have something similar. (The fact whether the union type is tagged or not isn’t relevant to this post).

If you understand Java’s intersection types A & B (meaning that something is both a subtype of A and of B), then it’s easy to understand union types A | B (meaning that something is a subtype of any of A or B). TypeScript shows a simple example of this

function printId(id: number | string) {
  console.log("Your ID is: " + id);
}
// OK
printId(101);
// OK
printId("202");
// Error
printId({ myID: 22342 });

Structural vs nominal typing

Such a union type (or intersection type) is a structural type, as opposed to what we’ve been doing in Java via nominal types, where you have to declare a named type for this union every time you want to use it. E.g. in jOOQ, we have things like:

interface FieldOrRow {}
interface Field<T> extends FieldOrRow {}
interface Row extends FieldOrRow {}

Very soon, the Java 17 distribution will seal the above type hierarchy as follows (incomplete, subtypes of Field<T> and Row are omitted for brevity):

sealed interface FieldOrRow permits Field<T>, Row {}
sealed interface Field<T> extends FieldOrRow permits ... {}
sealed interface Row extends FieldOrRow permits ... {}

There are pros and cons for doing these things structurally or nominally:

Structural typing:

  • Pro: You can create any ad-hoc union of any set of types anywhere
  • Pro: You don’t have to change the existing type hierarchy, which is essential when you don’t have access to it, e.g. when you want to do something like the above number | string type. This kinda works like JSR 308 type annotations that were introduced in Java 8.

Nominal typing:

  • Pro: You can attach documentation to the type, and reuse it formally (rather than structurally). TypeScript and many other languages offer type aliases for this kind of stuff, so you can have a bit of both worlds, though the alias is erased, meaning you keep the complex structural type leading to curious error messages.
  • Pro: You can seal the type hierarchy to allow for exhaustiveness checking among subtypes (e.g. above, there can only be Field<T> or Row subtypes of FieldOrRow. A structurally typed union type is implicitly “sealed” ad-hoc by the union type description (not sure if that’s how it’s called), but with nominal types, you can make sure no one else can extend the type hierarchy, (except where you permit it explicitly using the non-sealed keyword)

Ultimately, as ever so often, things like structural and nominal typing are two sides of the same coin, pros and cons mostly depending on taste and on how much you control a code base.

So, how are checked exceptions union types?

When you declare a method that throws checked exceptions, the return type of the method is really such a union type. Look at this example in Java:

public String getTitle(int id) throws SQLException;

The call-site now has to “check” the result of this method call using try-catch, or declare re-throwing the checked exception(s):

try {
    String title = getTitle(1);
    doSomethingWith(title);
}
catch (SQLException e) {
    handle(e);
}

If early Java had union types rather than checked exceptions, we might have declared this as follows, instead:

public String|SQLException getTitle(int id);

Likewise, a caller of this method will have to “check” the result of this method call. There’s no simple way of re-throwing it, so if we do want to re-throw, we’d need some syntax sugar, or repeat the same code all the time, Go-style:

// Hypothetical Java syntax:
String|SQLException result = getTitle(1);

switch (result) {
    case String title -> doSomethingWith(title);
    case SQLException e -> handle(e);
}

It would be obvious how such a JEP 406 style switch pattern matching statement or expression could implement an exhaustiveness check, just like with the existing JEP 409 sealed classes approach, the only difference, again, being that everything is now structurally typed, rather than nominally typed.

In fact, if you declare multiple checked exceptions, such as the JDK’s reflection API:

public Object invoke(Object obj, Object... args)
throws 
    IllegalAccessException, 
    IllegalArgumentException,
    InvocationTargetException

With union types, this would just be this, instead:

// Hypothetical Java syntax:
public Object
    | IllegalAccessException
    | IllegalArgumentException
    | InvocationTargetException invoke(Object obj, Object... args)

And the union type syntax from the catch block, which checks for exhaustiveness (yes, we have union types in catch!)…

try {
    Object returnValue = method.invoke(obj);
    doSomethingWith(returnValue);
}
catch (IllegalAccessException | IllegalArgumentException e) {
    handle1(e);
}
catch (InvocationTargetException e) {
    handle2(e);
}

Could still check for exhaustiveness with the switch pattern matching approach:

// Hypothetical Java syntax:
Object
    | IllegalAccessException
    | IllegalArgumentException
    | InvocationTargetException result = method.invoke(obj);

switch (result) {
    case IllegalAccessException, 
         IllegalArgumentException e -> handle1(e);
    case InvocationTargetException e -> handle2(e);
    case Object returnValue = doSomethingWith(returnValue);
}

A subtle caveat here is that exceptions are subtypes of Object, so we must put that case at the end, as it “dominates” the others (see JEP 406 for a discussion about dominance). Again, we can prove exhaustiveness, because all types that are involved in the union type have a switch case.

Can we emulate union types with checked exceptions?

You know what Jeff Goldblum would say

But this blog is known to do it anyway. Assuming that for every possible type, we had a synthetic (code generated?) checked exception that wraps it (because in Java, exceptions are not allowed to be generic):

// Use some "protective" base class, so no one can introduce 
// RuntimeExceptions to the type hierarchy
class E extends Exception {

    // Just in case you're doing this in performance sensitive code...
    @Override
    public Throwable fillInStackTrace() {
        return this;
    }
}

// Now create a wrapper exception for every type you want to represent
class EString extends E {
    String s;
    EString(String s) {
        this.s = s;
    }
}
class Eint extends E {
    int i;
    Eint(int i) {
        this.i = i;
    }
}

The benefit of this is we don’t have to wait for Valhalla to support primitive types in generics, nor to reify them. We’ve already emulated that as you can see above.

Next, we need a switch emulation for arbitrary degrees (22 will probably be enough?). Here’s one for degree 2:

// Create an arbitrary number of switch utilities for each arity up 
// to, say 22 as is best practice
class Switch2<E1 extends E, E2 extends E> {
    E1 e1;
    E2 e2;

    private Switch2(E1 e1, E2 e2) {
        this.e1 = e1;
        this.e2 = e2;
    }

    static <E1 extends E, E2 extends E> Switch2<E1, E2> of1(E1 e1) {
        return new Switch2<>(e1, null);
    }

    static <E1 extends E, E2 extends E> Switch2<E1, E2> of2(E2 e2) {
        return new Switch2<>(null, e2);
    }

    void check() throws E1, E2 {
        if (e1 != null)
            throw e1;
        else
            throw e2;
    }
}

And finally, here’s how we can emulate our exhaustiveness checking switch with catch blocks!

// "Union type" emulating String|int
Switch2<EString, Eint> s = Switch2.of1(new EString("hello"));

// Doesn't compile, Eint isn't caught (catches aren't exhaustive)
try {
    s.check();
}
catch (EString e) {}

// Compiles fine
try {
    s.check();
}
catch (EString e) {}
catch (Eint e) {}

// Also compiles fine
try {
    s.check();
}
catch (EString | Eint e) {}

// Doesn't compile because Eint "partially dominates" EString | Eint
try {
    s.check();
}
catch (Eint e) {}
catch (EString | Eint e) {}

“Neat”, huh? We could even imagine destructuring within the catch block, such that we can automatically unwrap the value from the auxiliary “E” type.

Since we already have “union types” in Java (in catch blocks), and since checked exception declarations could be retrofitted to form a union type with the method’s actual return type, my hopes are still that in some distant future, a more powerful Java will be available where these “union types” (and also intersection types) will be made first class. APIs like jOOQ would greatly profit from this!

Functional Dependencies in SQL GROUP BY

The SQL standard knows an interesting feature where you can project any functional dependencies of a primary (or unique) key that is listed in the GROUP BY clause without having to add that functional dependency to the GROUP BY clause explicitly.

What does this mean? Consider this simple schema:

CREATE TABLE author (
  id INT NOT NULL PRIMARY KEY,
  name TEXT NOT NULL
);

CREATE TABLE book (
  id INT NOT NULL PRIMARY KEY,
  author_id INT NOT NULL REFERENCES author,
  title TEXT NOT NULL
);

In order to count the number of books by author, we tend to write:

SELECT a.name, count(b.id)
FROM author a
LEFT JOIN book b ON a.id = b.author_id
GROUP BY 
  a.id,  -- Required, because names aren't unique
  a.name -- Required in some dialects, but not in others

We have to group by something unique in this case, because if two authors are called John Doe, we still want them to produce separate groups. So GROUP BY a.id is a given.

We’re used to also GROUP BY a.name, especially in these dialects that require this, since we list a.name in the SELECT clause:

  • Db2
  • Derby
  • Exasol
  • Firebird
  • HANA
  • Informix
  • Oracle
  • SQL Server

But is it really required? It isn’t as per the SQL standard, because there is a functional dependency between author.id and author.name. In other words, for each value of author.id, there is exactly one possible value of author.name, or author.name is a function of author.id

This means that it does not matter if we GROUP BY both columns, or only the primary key. The result must be the same in both cases, hence this is possible:

SELECT a.name, count(b.id)
FROM author a
LEFT JOIN book b ON a.id = b.author_id
GROUP BY a.id

Which SQL dialects support this?

At least the following SQL dialects support this language feature:

  • CockroachDB
  • H2
  • HSQLDB
  • MariaDB
  • MySQL
  • PostgreSQL
  • SQLite
  • Yugabyte

It’s noteworthy that MySQL used to simply ignore whether a column could be projected unambiguously or not, in the presence of GROUP BY. While the following query was rejected in most dialects, it was not, in MySQL, prior to the introduction of the ONLY_FULL_GROUP_BY mode:

SELECT author_id, title, count(*)
FROM author
GROUP BY author_id

What should we display for author.title, if an author has written more than one book? It doesn’t make sense, yet MySQL still used to allow it, and would just project any arbitrary value from the group.

Today, MySQL only allows for projecting columns with a functional dependency on the GROUP BY clause, as is permitted by the SQL standard.

Pros & Cons

While the shorter syntax that avoids the extra columns might be easier to maintain (easy to project additional columns, if required), there is some risk of queries breaking in production, namely when underlying constraints are disabled, e.g. for a migration. While it is unlikely that a primary key is disabled in a live system, it could still be the case, and without the key, a previously valid query will no longer be valid for the same reason why MySQL’s old interpretation was invalid: There’s no longer a guarantee of functional dependency.

Other syntax

Starting from jOOQ 3.16, and #11834, it will be possible to reference tables directly in the GROUP BY clause, instead of individual columns. For example:

SELECT a.name, count(b.id)
FROM author a
LEFT JOIN book b ON a.id = b.author_id
GROUP BY a

The semantics will be:

  • If the table has a primary key (composite or not), use that in the GROUP BY clause, instead
  • If the table doesn’t have a primary key, list all the columns from the table instead.

Since none of the RDBMS supported by jOOQ currently supports this syntax, it is a purely synthetic jOOQ feature.

Write C-Style Local Static Variables in Java 16

Java 16 includes an improvement that makes the language a bit more regular via JEP 395. The JEP says:

Static members of inner classes

It is currently specified to be a compile-time error if an inner class declares a member that is explicitly or implicitly static, unless the member is a constant variable. This means that, for example, an inner class cannot declare a record class member, since nested record classes are implicitly static.

We relax this restriction in order to allow an inner class to declare members that are either explicitly or implicitly static. In particular, this allows an inner class to declare a static member that is a record class.

What sounds like a minor necessary evil to make a new feature (record classes) more versatile actually has a life of its own. We can use it to emulate C-style local static variables, i.e. local variables that are:

  • Initialised only once (and lazily at that)
  • Shared among multiple executions of a method

This sounds like a rather hairy feature, global variables that are visible only locally. But in fact, it’s something I’ve wanted for a long time, especially when I wanted to cache regular expression patterns without polluting the class namespace.

Consider this code:

package p;

import java.util.regex.Pattern;

public class Test {
    public static void main(String[] args) {
        check("a");
        check("b");
    }
    
    static Pattern compile(String pattern) {
        System.out.println("compile(" + pattern + ")");
        return Pattern.compile(pattern);
    }
    
    static void check(String string) {
        // Re-compiling the pattern every time: Bad
        // Keeping the pattern local to the method: Good
        System.out.println("check(" + string + "): " 
            + compile("a").matcher(string).find());
    }
}

It prints:

compile(a)
check(a): true
compile(a)
check(b): false

Compiling a pattern can be costly if done frequently, so we better cache it. We used to do that like this:

package p;

import java.util.regex.Pattern;

public class Test {
    public static void main(String[] args) {
        check("a");
        check("b");
    }
    
    static Pattern compile(String pattern) {
        System.out.println("compile(" + pattern + ")");
        return Pattern.compile(pattern);
    }
    
    // Compiling the pattern only once: Good
    // Placing the pattern in a class namespace: Bad
	static final Pattern P_CHECK = compile("a");
	
    static void check(String string) {
        System.out.println("check(" + string + "): " 
            + P_CHECK.matcher(string).find());
    }
}

This now prints a more optimal output:

compile(a)
check(a): true
check(b): false

I.e. the regular expression pattern is compiled only once. But unfortunately, we had to pollute the entire class’s namespace, which can quickly become cumbersome if we have dozens of such regular expressions. Can we scope the P_CHECK variable to the check() method only? We now can!

package p;

import java.util.regex.Pattern;

public class Test {
    public static void main(String[] args) {
        check("a");
        check("b");
    }
    
    static Pattern compile(String pattern) {
        System.out.println("compile(" + pattern + ")");
        return Pattern.compile(pattern);
    }
    
    static void check(String string) {

        // Compiling the pattern only once: Good
        // Keeping the pattern local to the method: Good
        // Capturing scope: Egh...
        var patterns = new Object() { 
            static final Pattern P_CHECK = compile("a");
        };
        
        System.out.println("check(" + string + "): " 
            + patterns.P_CHECK.matcher(string).find());
    }
}

This again prints the desired, optimal output:

compile(a)
check(a): true
check(b): false

The combination of using var to use a non-denotable type (whose members we can dereference) along with the ability of putting static members in inner classes effectively emulates local static variables, just like in C!

Since the inner class is unlikely to escape its scope, the fact that it may be capturing scope isn’t that big of a risk as illustrated previously in a criticism of the double curly brace anti pattern. You’re still creating an extra class and a use-less object that escape analysis hopefully prevents from allocating, so it’s not exactly a very clean solution, but nice to know this is now possible.

The jOOQ Parser Ignore Comment Syntax

jOOQ’s parser can’t parse every possible SQL syntax. Try this random PostgreSQL syntax:

ALTER SYSTEM RESET ALL

And the jOOQ parser will complain:

DOMAIN, INDEX, SCHEMA, SEQUENCE, SESSION, TABLE, TYPE, or VIEW expected: [1:7] ALTER [*]SYSTEM RESET ALL

That’s perfectly fine. The goal of the jOOQ parser isn’t to understand all vendor specific syntax. The goal is to offer parser functionality for all of the syntax supported by jOOQ so far, and to be able to translate it between dialects. For a lot of syntax like the above, there isn’t an equivalent in other dialects anyway.

But if you’re using the jOOQ parser to simulate a database migration, e.g. to create a schema diff, or to generate jOOQ code using the DDLDatabase, then you may have some bits of vendor specific SQL in your scripts that you want jOOQ to ignore. For example, the above command may be in the middle of a migration script:

CREATE TABLE a (i int);
ALTER SYSTEM RESET ALL;
CREATE TABLE b (i int);

Luckily, with jOOQ, you can add special markers around the commands you wish to tell jOOQ to ignore, and jOOQ will ignore them. For this, just enable the Settings.parseIgnoreComments flag, and now you can use a special comment syntax:

CREATE TABLE a (i int);

/* [jooq ignore start] */
ALTER SYSTEM RESET ALL;
/* [jooq ignore stop] */

CREATE TABLE b (i int);

The syntax is transparent to your RDBMS, because they’re just comments. So, the RDBMS will see and execute this, just like before. So, your actual database migration isn’t affected:

CREATE TABLE a (i int);

/*                     */
ALTER SYSTEM RESET ALL;
/*                    */

CREATE TABLE b (i int);

But jOOQ, on the other hand, will interpret the comments a bit differently, and see and execute this, instead:

CREATE TABLE a (i int);

/*

                      */

CREATE TABLE b (i int);

Note, if you don’t like the [jooq ignore start] and [jooq ignore stop] tokens, you can customise them as well via Settings.parseIgnoreCommentStart and Settings.parseIgnoreCommentStop.

Like any of these purely text-based preprocessor syntaxes, this works completely transparently, even within some command syntax. Assuming you’re using some vendor-specific syntax in a CREATE TABLE‘s DEFAULT expression, you can just exclude that DEFAULT expression from jOOQ’s parsing:

CREATE TABLE t (
  a int 
    /* [jooq ignore start] */ 
    DEFAULT some_fancy_expression() 
    /* [jooq ignore stop] */
);

Obviously, you don’t have to format things this way, that’s just for illustration purposes on this blog. Now, again, the RDBMS will see and execute this:

CREATE TABLE t (
  a int 
    /*                     */ 
    DEFAULT some_fancy_expression() 
    /*                    */
);

Whereas jOOQ’s parser will see and execute this:

CREATE TABLE t (
  a int 
    /*

                          */
);

For more information about this topic, please refer to the jOOQ manual.

Use jOO位’s Sneaky Throw to Avoid Checked Exceptions

Don’t you hate how you have to wrap checked exception throwing code in static initialisers? E.g. you cannot write this in Java:

public class Test {
    static final Class<?> klass = Class.forName("org.h2.Driver");
}

There’s an unhandled ClassNotFoundException, and you can’t catch / rethrow it simply. A static initialiser is needed:

public class Test {
    static final Class<?> klass;

    static {
        try {
            klass = Class.forName("org.h2.Driver");
        }
        catch (ClassNotFoundException e) {
            throw new RuntimeException(e);
        }
    }
}

Yuck.

Luckily, one of jOO位’s lesser known features is the Sneaky class, which contains a bunch of utility methods that wrap a JDK functional interface to an equivalent, “sneaky-throwing” functional interface that doesn’t declare the checked exception.

In short, you can write:

public class Test {
    static final Class<?> klass = Sneaky.supplier(
        () -> Class.forName("org.h2.Driver")
    ).get();
}

The exception is simply re-thrown “sneakily”, as the JVM doesn’t care about an exception’s checked-ness. If you don’t have H2 on your classpath, you’ll get:

Exception in thread "main" java.lang.ExceptionInInitializerError
Caused by: java.lang.ClassNotFoundException: org.h2.Driver
	at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
	at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:520)
	at java.base/java.lang.Class.forName0(Native Method)
	at java.base/java.lang.Class.forName(Class.java:375)
	at org.jooq.test.util.Test.lambda$0(Test.java:44)
	at org.jooq.lambda.Unchecked.lambda$supplier$38(Unchecked.java:1695)
	at org.jooq.test.util.Test.<clinit>(Test.java:44)

You can use this approach wherever else a JDK functional interface is required, and you don’t care about an exception’s checked-ness, e.g. in streams:

// Doesn't compile:
Stream
    .generate(
        () -> Class.forName("org.h2.Driver"))
    .limit(1)
    .forEach(System.out::println);

// So ugly
Stream
    .generate(
        () -> {
            try {
                return Class.forName("org.h2.Driver");
            }
            catch (ClassNotFoundException e) {
                throw new RuntimeException(e);
            }
        })
    .limit(1)
    .forEach(System.out::println);

// Use jOO位's Sneaky supplier
Stream
    .generate(Sneaky.supplier(
        () -> Class.forName("org.h2.Driver")))
    .limit(1)
    .forEach(System.out::println);

Get jOO位 here: https://github.com/jOOQ/jOOL

Using Testcontainers to Generate jOOQ Code

Database first is at the core of jOOQ’s design. jOOQ has been made primarily for classic systems the database is always there and always has been and will never leave. This is because we think “data have mass”

This not only translates to moving logic closer to the data (see our previous posts about the cost of JDBC round trips or generating vendor agnostic procedural logic), but also avoiding moving data around between migrations (e.g. of RDBMS products).

Compared to how “heavy” data is, applications and UIs come and go. Speaking of go, maybe you’ll replace all of your Java code tomorrow for some go code. But you will keep the database if it isn’t trivial.

With this in mind, jOOQ assumes you have a pre-existing schema, which you mange with Flyway or Liquibase, and then you use jOOQ to reverse engineer your updated schema using the code generator.

The old days

In the old days, setting up an Oracle instance was very heavy, and also hard. I remember working at a company where we had shared development and test instances. The schema was always in flux. We couldn’t assume a stable dev version.

As such, pointing the jOOQ code generator towards a live database used to be a bit of a challenge, which is why jOOQ offers alternative, connection-free code generation modes, including:

  • The JPADatabase, if you have a pre-existing JPA entity based meta model.
  • The XMLDatabase, if you have some form of XML version of your schema, which you can XSL transform to jOOQ’s format
  • The DDLDatabase, which can interpret your DDL scripts, e.g. the ones you pass to Flyway, or the ones produced by pg_dump.
  • The LiquibaseDatabase, which simulates a Liquibase database migration and uses the simulated database output as a source for meta information of the code generator

But all of the above have the same limitation. You can’t really use many vendor-specific features, such as advanced stored procedures, data types, etc.

A modern approach using testcontainers

Ideally, unless you’re supporting several RDBMS products (most people don’t), you should work only with your production database product, say PostgreSQL.

Thanks to testcontainers.org, it’s very easy to programmatically, or configuratively, start up a PostgreSQL instance of any version in a Docker container. If you have a SQL script that contains your database, you can supply it to the testcontainers JDBC URL, e.g. like this:

jdbc:tc:postgresql:13:///sakila?TC_TMPFS=/testtmpfs:rw&TC_INITSCRIPT=file:${basedir}/src/main/resources/postgres-sakila-schema.sql

For more information, see their docs about JDBC support. Now, add the testcontainers dependency on your project classpath, e.g.

<dependency>
  <groupId>org.testcontainers</groupId>
  <artifactId>postgresql</artifactId>
</dependency>

And use the ContainerDatabaseDriver instead of the actual PostgreSQL driver for your code generation configuration in jOOQ, e.g. when using Maven:

<plugin>
  <groupId>org.jooq</groupId>
  <artifactId>jooq-codegen-maven</artifactId>

  <executions>
    <execution>
      <id>java-generator</id>
      <phase>generate-sources</phase>
      <goals>
        <goal>generate</goal>
      </goals>

      <configuration>
        <jdbc>
          <driver>
            org.testcontainers.jdbc.ContainerDatabaseDriver
          </driver>
          <url>${db.url}</url>
          <user>${db.username}</user>
          <password>${db.password}</password>
        </jdbc>
        <generator>
          <database>
            <inputSchema>public</inputSchema>
          </database>
          <target>
            <packageName>org.jooq.example.tc.db</packageName>
            <directory>src/main/java</directory>
          </target>
        </generator>
      </configuration>
    </execution>
  </executions>

  <dependencies>

    <!-- Junit seems a transitive dependency of testcontainers? -->
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>4.13.1</version>
    </dependency>
  </dependencies>
</plugin>

As simple as that! Check out the jOOQ-testcontainers-example for a runnable example that uses testcontainers for code generation using the above approach.

Adding database change management

A real world example would be using again Flyway or Liquibase, etc. to apply a complete database migration to your PostgreSQL instance inside of testcontainers, prior to generating code and/or running your integration tests.

This just slightly complicates things, but doesn’t produce any impossible problems. Instead of creating throwaway containers with a single TC_INITSCRIPT, you will now have to make sure the following steps are executed consecutively in your build somehow:

  1. A testcontainers instance of your database is started
  2. A Flyway or Liquibase migration is run inside of that database
  3. The jOOQ code generator reverse engineers that database
  4. Optionally, your integration tests also reuse the database of that container

Of course you should integration test your code! But for the sake of this discussion, that might be an optional step, as you may have different preferences on how to run those tests, e.g. more globally than just for this module. But in our example, let’s include the tests.

You can find the full example using testcontainers/flyway/jOOQ here.

Start the testcontainers instance

Unfortunately, testcontainers doesn’t ship any Maven / Gradle plugins yet to invoke container lifecycle management during a build. I’ve created a feature request for that here, which you should upvote: https://github.com/testcontainers/testcontainers-java/issues/4397.

But we can easily help ourselves by using the ever powerful Maven escape hatch that is the groovy-maven-plugin (the ideas are the same for gradle):

<plugin>
  <groupId>org.codehaus.gmaven</groupId>
  <artifactId>groovy-maven-plugin</artifactId>
  <version>2.1.1</version>
  <executions>
    <execution>
      <!-- Start the container in any phase before the actual code
           generation is required, i.e. at the latest in
           generate-sources -->
      <phase>generate-sources</phase>
      <goals>
        <goal>execute</goal>
      </goals>
      <configuration>
        <source>
          db = new org.testcontainers.containers.PostgreSQLContainer(
                  "postgres:latest")
            .withUsername("${db.username}")
            .withDatabaseName("postgres")
            .withPassword("${db.password}");
            
          db.start();

          // After you've started the container, collect its generated
          // JDBC URL (which contains a random port)
          project.properties.setProperty('db.url', db.getJdbcUrl());
        </source>
      </configuration>
    </execution>
  </executions>
  
  <dependencies>
    <dependency>
      <groupId>org.testcontainers</groupId>
      <artifactId>postgresql</artifactId>
      <version>1.15.2</version>
    </dependency>
  </dependencies>
</plugin>

So, that starts a container and keeps it running until the build terminates. I won’t show a graceful shutdown, because it’s not needed for the example, but you could implement that as well, of course.

Now, migrate your database

The above database is empty. Now to run the migration, in the example using Flyway, but it will be the same thing with Liquibase.

<plugin>
  <groupId>org.flywaydb</groupId>
  <artifactId>flyway-maven-plugin</artifactId>
  <version>7.14.0</version>
  <executions>
    <execution>

      <!-- We run the migration in the same phase, before jOOQ's
           code generation -->
      <phase>generate-sources</phase>
      <goals>
        <goal>migrate</goal>
      </goals>
      <configuration>

        <!-- This URL has been set by groovy, above -->
        <url>${db.url}</url>
        <user>${db.username}</user>
        <password>${db.password}</password>
        <locations>
          <location>
            filesystem:src/main/resources/db/migration
          </location>
        </locations>
      </configuration>
    </execution>
  </executions>
</plugin>

Add all the additional complexity of your migration in this configuration if you like. jOOQ wouldn’t know anything about it.

Now, generate the code

Again, nothing special here:

<plugin>
  <groupId>org.jooq</groupId>
  <artifactId>jooq-codegen-maven</artifactId>

  <executions>
    <execution>
      <id>java-generator</id>

      <!-- Same phase as above, but the previous plugins have already
           executed, so we're generating the db post migration -->
      <phase>generate-sources</phase>
      <goals>
        <goal>generate</goal>
      </goals>

      <configuration>
        <jdbc>

          <!-- Again, this URL has been set by groovy, above -->
          <url>${db.url}</url>
          <user>${db.username}</user>
          <password>${db.password}</password>
        </jdbc>
        <generator>
          <database>
            <inputSchema>public</inputSchema>
          </database>
          <target>
            <packageName>org.jooq.example.db</packageName>
          </target>
        </generator>
      </configuration>
    </execution>
  </executions>
</plugin>

And finally, optionally, integration test

If you want to re-use the above container with migrated database also in your integration tests, you could just pass along the generated JDBC URL to the maven-surefire-plugin as follows:

<plugin>
  <groupId>org.apache.maven.plugins</groupId>
  <artifactId>maven-surefire-plugin</artifactId>
  <configuration>
    <systemPropertyVariables>

      <!-- Again, this URL has been set by groovy, above -->
      <db.url>${db.url}</db.url>
      <db.username>${db.username}</db.username>
      <db.password>${db.password}</db.password>
    </systemPropertyVariables>
  </configuration>
</plugin>

There are many ways to achieve the same thing, this is one of them that works decently out of the box. You can check out a full example from github here, and play around with it:

https://github.com/jOOQ/jOOQ/tree/main/jOOQ-examples/jOOQ-testcontainers-flyway-example

More about testcontainers

To learn more about testcontainers, see our interview with Richard North here.

Using jOOQ to write vendor agnostic SQL with JPA’s native query or @Formula

If your legacy JPA application is using occasional native queries or Hibernate @Formula or Spring Data @Query annotation with vendor specific native SQL embedded in it, you can use jOOQ’s parsing connection and parsing data source to translate between dialects, without having to go all in on your jOOQ adoption – though I think it’s inevitable once you see what jOOQ can do for you.

Now, let’s design a table like this:

CREATE TABLE author (
  id INT NOT NULL,
  first_name TEXT,
  last_name TEXT NOT NULL,

  CONSTRAINT pk_author PRIMARY KEY (id)
);

Now, you may want to write a native query on this table using JPA’s EntityManager.createNativeQuery(), mapping that to entities. You could be using jOOQ’s DSL API for this, but let’s say you’re not ready for a migration to jOOQ yet, or you want to use actual SQL as provided by your DBA, not jOOQ’s DSL.

So, in MariaDB, you might be writing something like this:

List<Author> result =
em.createNativeQuery("""
    select a.id, nvl(a.first_name, 'N/A') as first_name, a.last_name
    from t_author as a
    order by a.id
    """, Author.class)
  .getResultList();

Where your entity is defined like this:

@Entity
@Table(name = "author")
public class Author {
    @Id
    public int id;

    @Column(name = "first_name")
    public String firstName;

    @Column(name = "last_name")
    public String lastName;

    // Constructors, getters, setters, equals, hashCode, etc
}

The above runs just fine and produces all the authors in MariaDB, which implemented support for Oracle’s NVL() function. But what about Oracle itself? The query fails on Oracle with:

ORA-00933: SQL command not properly ended

This is because in Oracle, you can’t use the AS keyword to alias tables, only to alias columns. Sure, you can remove that, but what about NVL()? You want this to work on MySQL and SQL Server, as well, but they complain:

MySQL

SQL Error [1305] [42000]: FUNCTION test.nvl does not exist

SQL Server

SQL Error [195] [S0010]: ‘nvl’ is not a recognized built-in function name.

Now, you have these options:

  • Use jOOQ to generate the SQL string for you, using the DSL
  • Use JPQL instead of a native query (but then heavily rewrite it, because JPQL is much less powerful than SQL)
  • Try your luck writing actual vendor agnostic SQL, manually
  • Or…

jOOQ’s parsing connection

You could use jOOQ’s parsing connection, which acts as a proxy to your actual connection, intercepting every SQL statement on the JDBC level in order to translate it to the target dialect.

This is as simple as wrapping your existing JDBC Connection or DataSource as follows:

DataSource originalDataSource = ...
DataSource parsingDataSource = DSL
    .using(originalDataSource, dialect)
    .parsingDataSource();

That’s it! I mean, you could pass some additional configuration Settings after the dialect, but that’s about as easy as it gets. The new DataSource can now run your SQL query on all the aforementioned dialects, e.g. you might see this in your DEBUG logs:

On MySQL:

-- org.hibernate.SQL
   select a.id, nvl(a.first_name, 'N/A') as first_name, a.last_name 
   from t_author as a 
   order by a.id
-- org.jooq.impl.ParsingConnection Translating from:
   select a.id, nvl(a.first_name, 'N/A') as first_name, a.last_name
   from t_author as a 
   order by a.id
-- org.jooq.impl.ParsingConnection Translation cache miss: 
   select a.id, nvl(a.first_name, 'N/A') as first_name, a.last_name
   from t_author as a 
   order by a.id
-- org.jooq.impl.ParsingConnection Translating to: 
   select a.id, ifnull(a.first_name, 'N/A') as first_name, a.last_name
   from t_author as a 
   order by a.id

On SQL Server:

-- org.hibernate.SQL
   select a.id, nvl(a.first_name, 'N/A') as first_name, a.last_name 
   from t_author as a 
   order by a.id
-- org.jooq.impl.ParsingConnection Translating from:
   select a.id, nvl(a.first_name, 'N/A') as first_name, a.last_name
   from t_author as a 
   order by a.id
-- org.jooq.impl.ParsingConnection Translation cache miss: 
   select a.id, nvl(a.first_name, 'N/A') as first_name, a.last_name
   from t_author as a 
   order by a.id
-- org.jooq.impl.ParsingConnection] Translating to: 
   select a.id, coalesce(a.first_name, 'N/A') first_name, a.last_name 
   from author a 
   order by a.id

Hibernate was tricked by jOOQ! The NVL function was translated to MySQL’s IFNULL or SQL Server COALESCE, and the AS keyword was removed from the SQL Server query. These are just simple examples, your actual SQL may be much more complex. Play around with the feature set online, here.

Also, the Settings.cacheParsingConnectionLRUCacheSize flag, which defaults to 8192, makes sure the same query doesn’t get re-translated all the time so you won’t spend too much time in jOOQ’s parser.

@Formula too, not just native queries

A quick win in Hibernate when you want to project additional values, akin to SQL’s own computed columns, which are available in many SQL dialects, is the @Formula annotation, which can be added to any entity, like so. Assuming this additional column:

ALTER TABLE author ADD year_of_birth INT;

We might have the following amended entity:

@Entity
@Table(name = "author")
public class Author {
    @Id
    public int id;

    @Column(name = "first_name")
    public String firstName;

    @Column(name = "last_name")
    public String lastName;

    @Column(name = "year_of_birth")
    public Integer yearOfBirth;

    @Formula("year_of_birth between 1981 and 1996")
    public Boolean millenial;

    // Constructors, getters, setters, equals, hashCode, etc
}

But unfortunately, there are still so many RDBMS that don’t actually support boolean types, and the @Formula annotation is purely static, and doesn’t allow for vendor-specific overrides. Are we going to rewrite that SQL query manually to make sure a SQL-92, vendor agnostic native SQL fragment is present that works on all dialects?

Or will we just plug in jOOQ’s parsing connection again? Let’s try the latter with:

Author author = em.find(Author.class, 1);

The MySQL log contains:

-- org.hibernate.SQL
   select 
     jpaauthorw0_.id as id1_4_0_,
     jpaauthorw0_.first_name as first_na2_4_0_, 
     jpaauthorw0_.last_name as last_nam3_4_0_, 
     jpaauthorw0_.year_of_birth between 1981 and 1996 as formula1_0_ 
   from author jpaauthorw0_ 
   where jpaauthorw0_.id=?
-- org.jooq.impl.ParsingConnection Translating from: [...]
-- org.jooq.impl.ParsingConnection Translation cache miss: [...]
-- org.jooq.impl.ParsingConnection Translating to: 
   select 
     jpaauthorw0_.id as id1_4_0_, 
     jpaauthorw0_.first_name as first_na2_4_0_, 
     jpaauthorw0_.last_name as last_nam3_4_0_, 
     jpaauthorw0_.year_of_birth between 1981 and 1996 as formula1_0_ 
   from author as jpaauthorw0_ 
   where jpaauthorw0_.id = ?

As you can see, jOOQ re-added the AS keyword to alias the on MySQL for good measure, because we like to be explicit about aliasing, and because that’s the default for Settings.renderOptionalAsKeywordForTableAliases

Whereas the SQL Server log contains:

-- org.hibernate.SQL 
   select 
     jpaauthorw0_.id as id1_4_0_, 
     jpaauthorw0_.first_name as first_na2_4_0_, 
     jpaauthorw0_.last_name as last_nam3_4_0_, 
     jpaauthorw0_.year_of_birth between 1981 and 1996 as formula1_0_ 
   from author jpaauthorw0_ 
   where jpaauthorw0_.id=?
-- org.jooq.impl.ParsingConnection Translating from: [...]
-- org.jooq.impl.ParsingConnection Translation cache miss: [...]
-- org.jooq.impl.ParsingConnection Translating to: 
   select
     jpaauthorw0_.id id1_4_0_, 
     jpaauthorw0_.first_name first_na2_4_0_, 
     jpaauthorw0_.last_name last_nam3_4_0_, 
     case 
       when jpaauthorw0_.year_of_birth between 1981 and 1996 
         then 1 
       when not (jpaauthorw0_.year_of_birth between 1981 and 1996) 
         then 0 
     end formula1_0_ 
   from author jpaauthorw0_ 
   where jpaauthorw0_.id = ?

A NULL-safe BOOLEAN type emulation (because if YEAR_OF_BIRTH is NULL (i.e. UNKNOWN), then MILLENIAL must be NULL, too (i.e. UNKNOWN))

Spring Data @Query annotation

Another case where native SQL can appear in JPA integrations is Spring Data JPA with its @Query annotation, especially when used with @Query(nativeQuery = true). Just like Hibernate’s @Formula, this annotation is compile-time static, with no way to override the value of the native query at runtime, short of maybe subtyping repositories per dialect.

But why go through all that hassle. It’s always the same thing. Just patch the DataSource with jOOQ’s parsing connection or parsing data source, and you’re set.

Conclusion

Even if you’re not using jOOQ’s DSL API, you can profit from jOOQ in many ways in your existing JDBC, R2DBC, JPA, MyBatis, etc. based applications by hooking in the jOOQ parsing connection and translating your vendor specific input dialect to any number of configurable output dialects.

If jOOQ’s parser can’t handle a feature, chances are, you can work around this limitation using the ParseListener SPI, e.g. when you want to support a hypothetical LOGICAL_XOR predicate (which is supported natively in MySQL):

Query query = configuration
    .derive(ParseListener.onParseCondition(ctx -> {
        if (ctx.parseFunctionNameIf("LOGICAL_XOR")) {
            ctx.parse('(');
            Condition c1 = ctx.parseCondition();
            ctx.parse(',');
            Condition c2 = ctx.parseCondition();
            ctx.parse(')');

            return CustomCondition.of(c -> {
                switch (c.family()) {
                    case MARIADB:
                    case MYSQL:
                        c.visit(condition("{0} xor {1}", c1, c2));
                        break;
                    default:
                        c.visit(c1.andNot(c2).or(c2.andNot(c1)));
                        break;
            });
        }

        // Let the parser take over if we don't know the token
        return null;
    })
    .dsl()
    .parser()
    .parseQuery(
        "select * from t where logical_xor(t.a = 1, t.b = 2)"
    );
  
System.out.println(DSL.using(SQLDialect.MYSQL).render(query));
System.out.println(DSL.using(SQLDialect.ORACLE).render(query));

The above program will print:

-- MYSQL:
select * 
from t
where (t.a = 1 xor t.b = 2);

-- ORACLE:
select * 
from t 
where (t.a = 1 and not (t.b = 2)) or (t.b = 2 and not (t.a = 1));

So, profit from using jOOQ to migrate your application’s vendor specific SQL off one RDBMS onto another, or to support multiple RDBMS products in a single application, with or without using jOOQ’s DSL!

Side note: Query transformation

This isn’t the topic of this blog post, but once you have jOOQ parse your every SQL statement, you can also use jOOQ to transform this SQL and tamper with the expression tree, for example by implementing client side row level security. The possibilities are endless!

Vendor Agnostic, Dynamic Procedural Logic with jOOQ

One of the strengths of modern RDBMS is the capability to mix the powerful SQL language with procedural code.

SQL is a 4th generation programming language (4GL), and as such, extremely well suited for querying and bulk data manipulation. Its functional-declarative nature allows for it to be optimised in highly efficient ways using cost-based optimisation, but also statically as we’ve blogged before.

Sometimes, however, an imperative 3GL is better suited for a given task. That’s where stored procedures shine, or more specifically, procedural languages of RDBMS.

Among the ones that jOOQ supports, at least these ones support procedures:

  • BigQuery
  • Db2
  • Exasol
  • Firebird
  • HANA
  • HSQLDB
  • Informix
  • MariaDB
  • MySQL
  • Oracle
  • PostgreSQL
  • SQL Server
  • Vertica

Others may do, as well, but jOOQ isn’t supporting their dialects yet.

Many have implemented their own procedural languages, some according to the ISO/IEC 9075-4 Persistent stored modules (SQL/PSM) standard, others have their own.

jOOQ support for procedural logic

Since jOOQ 3.12, our commercial distributions have supported anonymous blocks and the procedural statements they contain, such as the IF statement, LOOP statements, etc. Starting with jOOQ 3.15, we also support 3 types of statements to manage storing procedural logic in the catalog:

Using these statements via jOOQ may not be your every day use-case. You may prefer managing that logic via the native syntax, which is still more powerful than what jOOQ 3.15 supports (especially when you’re using Oracle’s PL/SQL), in case of which you’ll u se jOOQ purely to call your procedure from Java in the usual type safe manner.

But maybe, you have one of these use-cases?

  • You’re a product vendor, and you profit from procedural logic being vendor agnostic in order to support multiple of your clients’ RDBMS
  • Your procedural logic is dynamic, just like your SQL logic (and what other than jOOQ to use for that?)
  • You don’t have the necessary privileges to create procedures, functions, or triggers in your schema

In all of those cases, jOOQ is here to help you.

How does it work?

The first building block is the anonymous block, which isn’t supported by all of the above dialects, regrettably. jOOQ can emulate it on MySQL as discussed here, but not currently in other dialects.

Here’s a simple, empty anonymous block:

-- Db2
BEGIN
END

-- Firebird
EXECUTE BLOCK AS
BEGIN
END

-- MariaDB
BEGIN NOT ATOMIC
END;

-- Oracle
BEGIN
  NULL;
END;

-- PostgreSQL
DO $$
BEGIN
  NULL;
END;
$$

It doesn’t really do much, but you can try executing it as follows, with jOOQ:

ctx.begin().execute();

Now, let’s do something more interesting, such as:

// Assuming the usual static imports:
import static org.jooq.impl.DSL.*;
import static org.jooq.impl.SQLDataType.*;

// Then write
Variable<Integer> i = variable(unquotedName("i"), INTEGER);
Table<?> t = table(unquotedName("t"));
Field<Integer> col = field(unquotedName("col"), INTEGER);

ctx.begin(
    declare(i).set(1),

    while_(i.le(10)).loop(
        insertInto(t).columns(c).values(i),
        i.set(i.plus(1))
    )
).execute();

The above block executes:

-- Db2
BEGIN
  DECLARE i integer;
  SET i = 1;
  WHILE i <= 10 DO
    INSERT INTO t (c)
    VALUES (i);
    SET i = (i + 1);
  END WHILE;
END

-- FIREBIRD
EXECUTE BLOCK AS
  DECLARE i integer;
BEGIN
  :i = 1;
  WHILE (:i <= 10) DO BEGIN
    INSERT INTO t (c)
    VALUES (:i);
    :i = (:i + 1);
  END
END

-- MariaDB
BEGIN NOT ATOMIC
  DECLARE i int;
  SET i = 1;
  WHILE i <= 10 DO
    INSERT INTO t (c)
    VALUES (i);
    SET i = (i + 1);
  END WHILE;
END;

-- Oracle
DECLARE
  i number(10);
BEGIN
  i := 1;
  WHILE i <= 10 LOOP
    INSERT INTO t (c)
    VALUES (i);
    i := (i + 1);
  END LOOP;
END;

-- PostgreSQL
DO $$
DECLARE
  i int;
BEGIN
  i := 1;
  WHILE i <= 10 LOOP
    INSERT INTO t (c)
    VALUES (i);
    i := (i + 1);
  END LOOP;
END;
$$

-- SQL Server
BEGIN
  DECLARE @i int;
  SET @i = 1;
  WHILE @i <= 10 BEGIN
    INSERT INTO t (c)
    VALUES (@i);
    SET @i = (@i + 1);
  END;
END;

Easy as pie. Perhaps you prefer a FOR loop, instead? Try this:

ctx.begin(
    for_(i).in(1, 10).loop(
        insertInto(t).columns(c).values(i)
    )
).execute();

It produces the necessary emulations, if required, because regrettably, not all dialects support FOR:

-- Db2
BEGIN
  DECLARE i integer;
  SET i = 1;
  WHILE i <= 10 DO
    INSERT INTO t (c)
    VALUES (i);
    SET i = (i + 1);
  END WHILE;
END

-- Firebird
EXECUTE BLOCK AS
  DECLARE i integer;
BEGIN
  :i = 1;
  WHILE (:i <= 10) DO BEGIN
    INSERT INTO t (c)
    VALUES (:i);
    :i = (:i + 1);
  END
END

-- MariaDB
BEGIN NOT ATOMIC
  FOR i IN 1 .. 10 DO
    INSERT INTO t (c)
    VALUES (i);
  END FOR;
END;

-- Oracle
BEGIN
  FOR i IN 1 .. 10 LOOP
    INSERT INTO t (c)
    VALUES (i);
  END LOOP;
END;

-- PostgreSQL
DO $$
BEGIN
  FOR i IN 1 .. 10 LOOP
    INSERT INTO t (c)
    VALUES (i);
  END LOOP;
END;
$$

-- SQL Server
BEGIN
  DECLARE @i int;
  BEGIN
    SET @i = 1;
    WHILE @i <= 10 BEGIN
      INSERT INTO t (c)
      VALUES (@i);
      SET @i = (@i + 1);
    END;
  END;
END;

SQL vs procedures

Of course, this particular SQL statement would be better implemented using a single bulk insertion statement, purely with SQL, not with procedural logic

ctx.insertInto(t, c)
   .select(selectFrom(generateSeries(1, 10)))
   .execute();

Which translates to:

-- Db2
INSERT INTO t (c)
SELECT generate_series.generate_series
FROM (
  WITH
    generate_series(generate_series) AS (
      SELECT 1
      FROM SYSIBM.DUAL
      UNION ALL
      SELECT (generate_series + 1)
      FROM generate_series
      WHERE generate_series < 10
    )
  SELECT generate_series
  FROM generate_series
) generate_series;

-- Firebird
INSERT INTO t (c)
SELECT generate_series.generate_series
FROM (
  WITH RECURSIVE
    generate_series(generate_series) AS (
      SELECT 1
      FROM RDB$DATABASE
      UNION ALL
      SELECT (generate_series + 1)
      FROM generate_series
      WHERE generate_series < 10
    )
  SELECT generate_series
  FROM generate_series
) generate_series;

-- MariaDB
INSERT INTO t (c)
SELECT generate_series.generate_series
FROM (
  WITH RECURSIVE
    generate_series(generate_series) AS (
      SELECT 1
      UNION ALL
      SELECT (generate_series + 1)
      FROM generate_series
      WHERE generate_series < 10
    )
  SELECT generate_series
  FROM generate_series
) AS generate_series;

-- Oracle
INSERT INTO t (c)
SELECT generate_series.generate_series
FROM (
  SELECT (level + (1 - 1)) generate_series
  FROM DUAL
  CONNECT BY level <= ((10 + 1) - 1)
) generate_series;

-- PostgreSQL
INSERT INTO t (c)
SELECT generate_series.generate_series
FROM generate_series(1, 10);

-- SQL Server
WITH
  generate_series(generate_series) AS (
    SELECT 1
    UNION ALL
    SELECT (generate_series + 1)
    FROM generate_series
    WHERE generate_series < 10
  )
INSERT INTO t (c)
SELECT generate_series.generate_series
FROM (
  SELECT generate_series
  FROM generate_series
) generate_series

… but you get the point.

Storing the procedural logic

If you have the necessary privileges, and your procedural logic isn’t super dynamic, you may choose to store your logic in a procedure or function directly in your database. In some databases, this means a compiler will be able to eagerly translate the logic to something very efficient (e.g. machine code), instead of interpreting the logic on the fly.

Take the above WHILE loop, for example. You may want to store that as a procedure P:

Name p = unquotedName("p");

ctx.createProcedure(p)
   .modifiesSQLData()
   .as(
        declare(i).set(1),

        while_(i.le(10)).loop(
            insertInto(t).columns(c).values(i),
            i.set(i.plus(1))
        )
   )
   .execute();

This produces the following statements:

-- Db2
CREATE PROCEDURE p()
MODIFIES SQL DATA
BEGIN
  DECLARE i integer;
  SET i = 1;
  WHILE i <= 10 DO
    INSERT INTO t (c)
    VALUES (i);
    SET i = (i + 1);
  END WHILE;
END;

-- Firebird
CREATE PROCEDURE p()
AS
  DECLARE i integer;
BEGIN
  :i = 1;
  WHILE (:i <= 10) DO BEGIN
    INSERT INTO t (c)
    VALUES (:i);
    :i = (:i + 1);
  END
END

-- MariaDB
CREATE PROCEDURE p()
MODIFIES SQL DATA
BEGIN
  DECLARE i int;
  SET i = 1;
  WHILE i <= 10 DO
    INSERT INTO t (c)
    VALUES (i);
    SET i = (i + 1);
  END WHILE;
END;

-- Oracle
CREATE PROCEDURE p
AS
  i number(10);
BEGIN
  i := 1;
  WHILE i <= 10 LOOP
    INSERT INTO t (c)
    VALUES (i);
    i := (i + 1);
  END LOOP;
END;

-- PostgreSQL
CREATE PROCEDURE p()
LANGUAGE plpgsql
AS
$$
DECLARE
  i int;
BEGIN
  i := 1;
  WHILE i <= 10 LOOP
    INSERT INTO t (c)
    VALUES (i);
    i := (i + 1);
  END LOOP;
END;
$$

-- SQL Server
CREATE PROCEDURE p
AS
BEGIN
  DECLARE @i int;
  SET @i = 1;
  WHILE @i <= 10 BEGIN
    INSERT INTO t (c)
    VALUES (@i);
    SET @i = (@i + 1);
  END;
END;

And now, what better way to call this procedure than, again, an anonymous block?

ctx.begin(call(unquotedName("p"))).execute();

Producing:

-- Db2
BEGIN
  CALL p();
END

-- Firebird
EXECUTE BLOCK AS
BEGIN
  EXECUTE PROCEDURE p;
END

-- MariaDB
BEGIN NOT ATOMIC
  CALL p();
END;

-- Oracle
BEGIN
  p();
END;

-- PostgreSQL
DO $$
BEGIN
  CALL p();
END;
$$

-- SQL Server
BEGIN
  EXEC p ;
END;

If you’re using jOOQ in Flyway or Liquibase to generate procedures during your database migrations, you can obviously generate jOOQ procedure stubs to call in a more type safe manner, instead of the above dynamic procedure call.

Parsing procedural logic

This jOOQ feature is not really exceptional. You can play around with our parser / translator here: https://www.jooq.org/translate. It can definitely help you translate your (simpler) stored procedures between dialects, such as PL/SQL, T-SQL, PL/pgSQL, etc.

Conclusion

As a rule of thumb, if you can do it with SQL (the 4GL), do it with SQL alone. But sometimes, you can’t. A 3GL is a better choice for an algorithm. When using jOOQ, you’ll naturally think of using Java to implement that 3GL algorithm. But wait, you could move the logic to the server for (drastically) increased performance!

Thanks to jOOQ, you can generate procedural logic that is:

  • Dynamic
  • Vendor agnostic
  • Anonymous or stored

Just like you’re used to, from jOOQ, for SQL