Fun with PostGIS: Mandelbrot Set, Game of Life, and More

The upcoming jOOQ 3.16 will finally offer support for the various RDBMS GIS extensions via issue #982. This is great news per se, and will be covered in a future blog post, when the integration is ready. This post here is about something else.

Adding support for such a feature is a great source of procrastination. You see, when you develop a backend library, all you ever do is implement algorithms, API, and other abstract stuff. But GIS is different. With GIS, a SQL developer like me suddenly has access to a “UI”, and having access to a “UI” woke the inner programmer child in me.

I was pleasantly surprised that DBeaver, my preferred SQL editor, has out of the box support for GIS’s WKT format. E.g. run a query like this:

SELECT st_geomfromtext('polygon((0 0, 2 3, 0 6, -2 3, 0 0))');

The output being

Let’s play around with GIS

So, naturally, what other thing to do than play around with it? The most obvious next thing to generate would be your favourite logo, which happens to be very easy to map to polygons. Let’s look at a few GIS features step by step. And for this blog post, I’ll be using PostGIS, which you can get very easily from docker hub:

WITH
  -- These "sprites" are just reusable 2D polygons, which I declared
  -- to avoid having to deal with the lengthy WKT format
  sprites AS (
    SELECT 

      -- We project the original 1x1 square, and a scaled 4x4 version
      s AS square,
      st_scale(s, 4, 4) AS square4

    -- Here, we're creating a square called "s"
    FROM (VALUES 
      (st_polygonfromtext('polygon ((0 0, 1 0, 1 1, 0 1, 0 0))'))
    ) t (s)
  )

-- st_union combines 2 polygons into a multipolygon
SELECT st_union(square, st_translate(square4, 2, 0))
FROM sprites

The output of the above query is

MULTIPOLYGON (((0 0, 1 0, 1 1, 0 1, 0 0)), ((2 0, 6 0, 6 4, 2 4, 2 0)))

Or, using DBeaver’s visualisation utility:

The st_union isn’t much different from any other set union. Note that I’ve translated the larger square 2 units to the right, so they don’t overlap. Otherwise, the union would have just been the bigger square.

A polygon describes a set of points in the 2D number plane. Creating a union of those two sets of points is natural. We can also create a difference of the two squares, instead:

WITH
  sprites AS (
    SELECT 
      s AS square,
      st_scale(s, 4, 4) AS square4
    FROM (VALUES 
      (st_polygonfromtext('polygon ((0 0, 1 0, 1 1, 0 1, 0 0))'))
    ) t (s)
  )
SELECT st_difference(square4, square)
FROM sprites

Resulting in:

POLYGON ((0 4, 4 4, 4 0, 1 0, 1 1, 0 1, 0 4))

Or, visually:

With these simple tools, we can now create all 4 letters of the jOOQ logo. As a reminder, the tools were:

  • st_polygonfromtext: Create a polygon from a WKT text representation
  • st_scale: Scale any geometry to a new size
  • st_translate: Translate any geometry to a new position
  • st_union: Combine two geometries (or more, if used as an aggregate function)
  • st_difference: Remove one geometry from another

The GIS API is vast, both in PostGIS as well as in the ISO/IEC 13249-3:2016 standard version, but these simple tools shall suffice for now. Let’s look at this query:

WITH
  -- Creating the two squares to work with
  sprites AS (
    SELECT 
      s AS square,
      st_scale(s, 4, 4) AS square4
    FROM (VALUES 
      (st_polygonfromtext('polygon ((0 0, 1 0, 1 1, 0 1, 0 0))'))
    ) t (s)
  ),
  
  -- All the 4 letters j, o1, o2, q of the jOOQ logo
  letters AS (
    SELECT
    
      -- The letter j
      st_difference(
        st_difference(
          st_difference(
            square4, 
            st_translate(st_scale(square, 2, 3), 1, 1)
          ),
          st_translate(st_scale(square, 1, 2), 0, 2)
        ),
        st_translate(st_scale(square, 1, 0.5), 3, 2.5)
      ) AS j,
      
      -- The letter o
      st_difference(square4, st_translate(square, 1, 1)) AS o1,
      
      -- The letter o
      st_difference(square4, st_translate(square, 2, 2)) AS o2,
      
      -- The letter q
      st_union(
        st_difference(
          square4, 
          st_translate(st_scale(square, 2, 2), 1, 1)
        ),
        st_translate(st_scale(square, 1, 2.5), 1.5, -1)
      ) as q
    FROM sprites
  )
SELECT

  -- Combine all the 4 letters
  st_union(v)
FROM
  letters,
  
  -- Arrange the 4 letters next to each other
  LATERAL (VALUES
    (st_translate(j, 0, 5)),
    (st_translate(o1, 5, 5)),
    (o2),
    (st_translate(q, 5, 0))
  ) t (v);

This produces:

Neat, huh?

Next step: The Mandelbrot Set

A natural next step for any procrastinator is to generate the Mandelbrot set. To prepare ourselves for what’s behind the Mandelbrot, have a look at this neat video (more procrastination):

There are different ways to calculate it with SQL, here’s one that I came up with:

WITH RECURSIVE

  -- These are the dimensions that you can play around with
  dims (r1, r2, i1, i2, s, it, p) AS (
    VALUES (
      
      -- The dimensions of the real axis
      -2::float, 1::float, 
      
      -- The dimensions of the imaginary axis
      -1.5::float, 1.5::float, 
      
      -- The step size on each axis, per pixel or sprite
      0.01::float, 
      
      -- The maximum number of iterations
      100, 
      
      -- "Infinity", i.e. when to stop
      256.0::float
    )
  ),
  
  -- The square again, as before
  sprites (s) AS (VALUES 
    (st_polygonfromtext('polygon ((0 0, 0 1, 1 1, 1 0, 0 0))'))
  ),
  
  -- The number plane, as ints
  n1 (r, i) AS (
    SELECT r, i 
    FROM 
      dims, 
      generate_series((r1 / s)::int, (r2 / s)::int) r,
      generate_series((i1 / s)::int, (i2 / s)::int) i
  ),
  
  -- The number plane as scaled floats
  n2 (r, i) AS (
    SELECT r::float * s::float, i::float * s::float
    FROM dims, n1
  ),
  
  -- The recursive calculation of the Mandelbrot formula
  -- zn = (zn-1)^2 + c
  l (cr, ci, zr, zi, g, it, p) AS (
    SELECT r, i, 0::float, 0::float, 0, it, p FROM n2, dims
    UNION ALL
    SELECT cr, ci, zr*zr - zi*zi + cr, 2*zr*zi + ci, g + 1, it, p 
    FROM l
    
    -- The recursions stops when we reach the maximum
    WHERE g < it
    
    -- Or, when we reach "infinity"
    AND zr*zr + zi*zi < p
  ),
  
  -- Find the last calculated value per point in the
  -- complex number plane c (cr, ci), discard the others
  x (cr, ci, zr, zi, g) AS (
    SELECT DISTINCT ON (cr, ci) cr, ci, zr, zi, g
    FROM l
    ORDER BY cr, ci, g DESC
  )
  
-- Turn every calculated point into a square
SELECT 
  st_union(
    st_translate(sprites.s, round(cr / dims.s), round(ci / dims.s))
  )
FROM x, sprites, dims

-- Retain only the points *not* belonging to the Mandelbrot set
-- You can also inverse the equation to retain the points that
-- belong to the set
WHERE zr*zr + zi*zi > p;

Note, I’m using an artificial “infinity” here, because:

  1. That speeds things up with little noticeable difference at this zoom scale
  2. I couldn’t figure out how to make PostgreSQL overflow float operations to float infinities, like Java or CockroachDB or other IEEE 754 float implementations do. Any help appreciated, see this Stack Overflow question.

The output is the well known shape

You can play around with this and use different values for “dims” to zoom in, e.g.

dims (r1, r2, i1, i2, s, it, p) as (values (
  (-0.925-0.032)::float, (-0.925+0.032)::float, 
  (0.266-0.032)::float, (0.266+0.032)::float, 
  0.00005::float, 100, 256.0::float)
)

This will generate some really neat zooms, all with SQL:

Granted, it’s not the most efficient way to calculate these things, but that wasn’t the point here, was it?

Game of Life

I can be nerd-sniped:

And of course, I had to accept that challenge too! Now, the Game of Life is a simple “game” where we have the x,y natural number plane (e.g. “pixels”), and with each coordinate, we associate whether that “cell” is dead or alive. Then, we establish the following set of rules:

  1. Any live cell with two or three live neighbours survives.
  2. Any dead cell with three live neighbours becomes a live cell.
  3. All other live cells die in the next generation. Similarly, all other dead cells stay dead.

No, it’s hard to “animate” things in SQL using spatial extensions, so as a workaround, I’ll just display iterations of 100×100 pixel tiles of the Game of Life next to each other. The first iteration is just the starting point, e.g. a random number of “alive” cells:

WITH RECURSIVE

  -- Again generate a "sprite"a from a square polygon
  sprites (s) AS (VALUES (
    st_polygonfromtext('polygon ((0 0, 0 1, 1 1, 1 0, 0 0))')
  )),

  -- Generate the x, y number plane and associate a boolean value
  -- with each coordinate, generated randomly.
  -- 10% of all cells are alive
  m (x, y, b) AS (
    SELECT x, y, 
      CASE WHEN random() > 0.9 THEN 1 ELSE 0 END 
    FROM 
      generate_series(1, 100) x, 
      generate_series(1, 100) y
  )
SELECT st_union(st_translate(sprites.s, x::float, y::float))
FROM m, sprites

-- Display only "alive" cells
WHERE b = 1

This will produce something like:

Now, all we have to do is iterate the game formula. Now, recursive SQL has quite a few limitations. E.g. I couldn’t get PostgreSQL to aggregate recursive data, nor self-join the recursive table to find the nearest neighbors of any cell. But with window functions, this is definitely possible. So here goes:

WITH RECURSIVE
  sprites (s) AS (VALUES (
    st_polygonfromtext('polygon ((0 0, 0 1, 1 1, 1 0, 0 0))')
  )),
  m (x, y, b) AS (
    SELECT x, y, 
      CASE WHEN random() > 0.9 THEN 1 ELSE 0 END
    FROM 
      generate_series(1, 100) x, 
      generate_series(1, 100) y
  ),
  
  -- This is the recursion of the Game of Life
  e (x, y, b, g) AS (
  
    -- The recursion starts with the above initial data
    SELECT x, y, b, 1 FROM m
    UNION ALL
    SELECT
      e.x, e.y,
      CASE
      
        -- If this cell is alive, and 2 or 3
        -- neighbors are alive, too, then this
        -- cell stays alive
        WHEN e.b = 1 AND
          sum(e.b) OVER w1
        + sum(e.b) OVER w2
        + sum(e.b) OVER w3 IN (2, 3)
          THEN 1
          
        -- If this cell is dead, and 3 neighbors
        -- are alive, then this cell becomes alive
        WHEN e.b = 0 AND
          sum(e.b) OVER w1
        + sum(e.b) OVER w2
        + sum(e.b) OVER w3 = 3
          THEN 1
          
        -- Otherwise, this cell dies or stays dead
        ELSE 0
      END,
      e.g + 1
    FROM e
    
    -- The recursion stops after 100 iterations
    WHERE e.g < 100
    WINDOW
    
      -- We order the data set by x, y. In SQL, there isn't 
      -- a notion of a 2-dimensional number plane. We only
      -- have 1 dimension.
      o AS (ORDER BY x, y),
      
      -- w1 is all the neighbors of the previous row in the x, y
      -- plane, which is all the SQL rows that are 101-99 rows
      -- before the current row.
      w1 AS (o ROWS BETWEEN 101 PRECEDING AND  99 PRECEDING),
      
      -- w2 is all the neighbors of the current row in the x, y
      -- number plane, excluding the current cell
      w2 AS (o ROWS BETWEEN   1 PRECEDING AND   1 FOLLOWING 
               EXCLUDE CURRENT ROW),
               
      -- w3 is all the neighbors of the next row
      w3 AS (o ROWS BETWEEN  99 FOLLOWING AND 101 FOLLOWING)
  )

-- Finally, combine all the iterations  
SELECT st_union(st_translate(
  sprites.s, 
  (x + (g - 1) % 10 * 120)::float, 
  (y - (g - 1) / 10 * 120)::float
))
FROM e, sprites
WHERE b = 1
;

Think of the window specifications as follows:

----+-------------+-------------+-------------+-------------+----
... | 105: (1, 5) | 106: (1, 6) | 107: (1, 7) | 108: (1, 8) | ...
... | 205: (2, 5) | 206: (2, 6) | 207: (2, 7) | 208: (2, 8) | ...
... | 305: (3, 5) | 306: (3, 6) | 307: (3, 7) | 308: (3, 8) | ...
... | 405: (4, 5) | 406: (4, 6) | 407: (4, 7) | 408: (4, 8) | ...

So, in a 100×100 grid, x=3,y=7 can be encoded as 307, and its neighbors are

  • w1: 206, 207, 208, i.e. between 101 preceding and 99 preceding of 307
  • w2: 306, 308, i.e. between 1 preceding and 1 following of 307
  • w3: 406, 407, 408, i.e. between 99 following and 101 following of 307

The output looks like this:

Or, zooming in on iterations 1-4:

Or 21-24:

That’s really cool, huh!

Glider Gun

The nerd-snipe was to animate the glider gun pattern, which just means we’ll have to replace the random generation of the first iteration by something constant.

WITH RECURSIVE
  sprites (s) AS (VALUES (
    st_polygonfromtext('polygon ((0 0, 0 1, 1 1, 1 0, 0 0))')
  )),
  m (x, y, b) AS (
    SELECT x, y, 

      -- Initial data here
      CASE WHEN (x, y) IN (
        (2, 6), (2, 7), (3, 6), (3, 7), (12, 6), (12, 7), (12, 8), 
        (13, 5), (13, 9), (14, 4), (14, 10), (15, 4), (15, 10),
        (16, 7), (17, 5), (17, 9), (18, 6), (18, 7), (18, 8), 
        (19, 7), (22, 4), (22, 5), (22, 6), (23, 4), (23, 5), 
        (23, 6), (24, 3), (24, 7), (26, 2), (26, 3), (26, 7), 
        (26, 8), (36, 4), (36, 5), (37, 4), (37, 5)
      ) THEN 1 ELSE 0 END 
    FROM 
      generate_series(1, 100) x, 
      generate_series(1, 100) y
  ),
  e (x, y, b, g) AS (
    SELECT x, y, b, 1 FROM m
    UNION ALL
    SELECT
      e.x, e.y,
      CASE
        WHEN e.b = 1 AND
          sum(e.b) OVER w1
        + sum(e.b) OVER w2
        + sum(e.b) OVER w3 IN (2, 3)
          THEN 1
        WHEN e.b = 0 AND
          sum(e.b) OVER w1
        + sum(e.b) OVER w2
        + sum(e.b) OVER w3 = 3
          THEN 1
        ELSE 0
      END,
      e.g + 1
    FROM e
    WHERE e.g < 100
    WINDOW
      o AS (ORDER BY x, y),
      w1 AS (o ROWS BETWEEN 101 PRECEDING AND  99 PRECEDING),
      w2 AS (o ROWS BETWEEN   1 PRECEDING AND   1 FOLLOWING 
               EXCLUDE CURRENT ROW),
      w3 AS (o ROWS BETWEEN  99 FOLLOWING AND 101 FOLLOWING)
  )
SELECT st_union(st_translate(
  sprites.s, 
  (x + (g - 1) % 10 * 120)::float, 
  (y - (g - 1) / 10 * 120)::float
))
FROM e, sprites
WHERE b = 1
;

And, as you can see, the gun works:

It started with this:

And eventually produces the well known gliders:

The jOOQ Parser Ignore Comment Syntax

jOOQ’s parser can’t parse every possible SQL syntax. Try this random PostgreSQL syntax:

ALTER SYSTEM RESET ALL

And the jOOQ parser will complain:

DOMAIN, INDEX, SCHEMA, SEQUENCE, SESSION, TABLE, TYPE, or VIEW expected: [1:7] ALTER [*]SYSTEM RESET ALL

That’s perfectly fine. The goal of the jOOQ parser isn’t to understand all vendor specific syntax. The goal is to offer parser functionality for all of the syntax supported by jOOQ so far, and to be able to translate it between dialects. For a lot of syntax like the above, there isn’t an equivalent in other dialects anyway.

But if you’re using the jOOQ parser to simulate a database migration, e.g. to create a schema diff, or to generate jOOQ code using the DDLDatabase, then you may have some bits of vendor specific SQL in your scripts that you want jOOQ to ignore. For example, the above command may be in the middle of a migration script:

CREATE TABLE a (i int);
ALTER SYSTEM RESET ALL;
CREATE TABLE b (i int);

Luckily, with jOOQ, you can add special markers around the commands you wish to tell jOOQ to ignore, and jOOQ will ignore them. For this, just enable the Settings.parseIgnoreComments flag, and now you can use a special comment syntax:

CREATE TABLE a (i int);

/* [jooq ignore start] */
ALTER SYSTEM RESET ALL;
/* [jooq ignore stop] */

CREATE TABLE b (i int);

The syntax is transparent to your RDBMS, because they’re just comments. So, the RDBMS will see and execute this, just like before. So, your actual database migration isn’t affected:

CREATE TABLE a (i int);

/*                     */
ALTER SYSTEM RESET ALL;
/*                    */

CREATE TABLE b (i int);

But jOOQ, on the other hand, will interpret the comments a bit differently, and see and execute this, instead:

CREATE TABLE a (i int);

/*

                      */

CREATE TABLE b (i int);

Note, if you don’t like the [jooq ignore start] and [jooq ignore stop] tokens, you can customise them as well via Settings.parseIgnoreCommentStart and Settings.parseIgnoreCommentStop.

Like any of these purely text-based preprocessor syntaxes, this works completely transparently, even within some command syntax. Assuming you’re using some vendor-specific syntax in a CREATE TABLE‘s DEFAULT expression, you can just exclude that DEFAULT expression from jOOQ’s parsing:

CREATE TABLE t (
  a int 
    /* [jooq ignore start] */ 
    DEFAULT some_fancy_expression() 
    /* [jooq ignore stop] */
);

Obviously, you don’t have to format things this way, that’s just for illustration purposes on this blog. Now, again, the RDBMS will see and execute this:

CREATE TABLE t (
  a int 
    /*                     */ 
    DEFAULT some_fancy_expression() 
    /*                    */
);

Whereas jOOQ’s parser will see and execute this:

CREATE TABLE t (
  a int 
    /*

                          */
);

For more information about this topic, please refer to the jOOQ manual.

Using Testcontainers to Generate jOOQ Code

Database first is at the core of jOOQ’s design. jOOQ has been made primarily for classic systems the database is always there and always has been and will never leave. This is because we think “data have mass”

This not only translates to moving logic closer to the data (see our previous posts about the cost of JDBC round trips or generating vendor agnostic procedural logic), but also avoiding moving data around between migrations (e.g. of RDBMS products).

Compared to how “heavy” data is, applications and UIs come and go. Speaking of go, maybe you’ll replace all of your Java code tomorrow for some go code. But you will keep the database if it isn’t trivial.

With this in mind, jOOQ assumes you have a pre-existing schema, which you mange with Flyway or Liquibase, and then you use jOOQ to reverse engineer your updated schema using the code generator.

The old days

In the old days, setting up an Oracle instance was very heavy, and also hard. I remember working at a company where we had shared development and test instances. The schema was always in flux. We couldn’t assume a stable dev version.

As such, pointing the jOOQ code generator towards a live database used to be a bit of a challenge, which is why jOOQ offers alternative, connection-free code generation modes, including:

  • The JPADatabase, if you have a pre-existing JPA entity based meta model.
  • The XMLDatabase, if you have some form of XML version of your schema, which you can XSL transform to jOOQ’s format
  • The DDLDatabase, which can interpret your DDL scripts, e.g. the ones you pass to Flyway, or the ones produced by pg_dump.
  • The LiquibaseDatabase, which simulates a Liquibase database migration and uses the simulated database output as a source for meta information of the code generator

But all of the above have the same limitation. You can’t really use many vendor-specific features, such as advanced stored procedures, data types, etc.

A modern approach using testcontainers

Ideally, unless you’re supporting several RDBMS products (most people don’t), you should work only with your production database product, say PostgreSQL.

Thanks to testcontainers.org, it’s very easy to programmatically, or configuratively, start up a PostgreSQL instance of any version in a Docker container. If you have a SQL script that contains your database, you can supply it to the testcontainers JDBC URL, e.g. like this:

jdbc:tc:postgresql:13:///sakila?TC_TMPFS=/testtmpfs:rw&TC_INITSCRIPT=file:${basedir}/src/main/resources/postgres-sakila-schema.sql

For more information, see their docs about JDBC support. Now, add the testcontainers dependency on your project classpath, e.g.

<dependency>
  <groupId>org.testcontainers</groupId>
  <artifactId>postgresql</artifactId>
</dependency>

And use the ContainerDatabaseDriver instead of the actual PostgreSQL driver for your code generation configuration in jOOQ, e.g. when using Maven:

<plugin>
  <groupId>org.jooq</groupId>
  <artifactId>jooq-codegen-maven</artifactId>

  <executions>
    <execution>
      <id>java-generator</id>
      <phase>generate-sources</phase>
      <goals>
        <goal>generate</goal>
      </goals>

      <configuration>
        <jdbc>
          <driver>
            org.testcontainers.jdbc.ContainerDatabaseDriver
          </driver>
          <url>${db.url}</url>
          <user>${db.username}</user>
          <password>${db.password}</password>
        </jdbc>
        <generator>
          <database>
            <inputSchema>public</inputSchema>
          </database>
          <target>
            <packageName>org.jooq.example.tc.db</packageName>
            <directory>src/main/java</directory>
          </target>
        </generator>
      </configuration>
    </execution>
  </executions>

  <dependencies>

    <!-- Junit seems a transitive dependency of testcontainers? -->
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>4.13.1</version>
    </dependency>
  </dependencies>
</plugin>

As simple as that! Check out the jOOQ-testcontainers-example for a runnable example that uses testcontainers for code generation using the above approach.

Adding database change management

A real world example would be using again Flyway or Liquibase, etc. to apply a complete database migration to your PostgreSQL instance inside of testcontainers, prior to generating code and/or running your integration tests.

This just slightly complicates things, but doesn’t produce any impossible problems. Instead of creating throwaway containers with a single TC_INITSCRIPT, you will now have to make sure the following steps are executed consecutively in your build somehow:

  1. A testcontainers instance of your database is started
  2. A Flyway or Liquibase migration is run inside of that database
  3. The jOOQ code generator reverse engineers that database
  4. Optionally, your integration tests also reuse the database of that container

Of course you should integration test your code! But for the sake of this discussion, that might be an optional step, as you may have different preferences on how to run those tests, e.g. more globally than just for this module. But in our example, let’s include the tests.

You can find the full example using testcontainers/flyway/jOOQ here.

Start the testcontainers instance

Unfortunately, testcontainers doesn’t ship any Maven / Gradle plugins yet to invoke container lifecycle management during a build. I’ve created a feature request for that here, which you should upvote: https://github.com/testcontainers/testcontainers-java/issues/4397.

But we can easily help ourselves by using the ever powerful Maven escape hatch that is the groovy-maven-plugin (the ideas are the same for gradle):

<plugin>
  <groupId>org.codehaus.gmaven</groupId>
  <artifactId>groovy-maven-plugin</artifactId>
  <version>2.1.1</version>
  <executions>
    <execution>
      <!-- Start the container in any phase before the actual code
           generation is required, i.e. at the latest in
           generate-sources -->
      <phase>generate-sources</phase>
      <goals>
        <goal>execute</goal>
      </goals>
      <configuration>
        <source>
          db = new org.testcontainers.containers.PostgreSQLContainer(
                  "postgres:latest")
            .withUsername("${db.username}")
            .withDatabaseName("postgres")
            .withPassword("${db.password}");
            
          db.start();

          // After you've started the container, collect its generated
          // JDBC URL (which contains a random port)
          project.properties.setProperty('db.url', db.getJdbcUrl());
        </source>
      </configuration>
    </execution>
  </executions>
  
  <dependencies>
    <dependency>
      <groupId>org.testcontainers</groupId>
      <artifactId>postgresql</artifactId>
      <version>1.15.2</version>
    </dependency>
  </dependencies>
</plugin>

So, that starts a container and keeps it running until the build terminates. I won’t show a graceful shutdown, because it’s not needed for the example, but you could implement that as well, of course.

Now, migrate your database

The above database is empty. Now to run the migration, in the example using Flyway, but it will be the same thing with Liquibase.

<plugin>
  <groupId>org.flywaydb</groupId>
  <artifactId>flyway-maven-plugin</artifactId>
  <version>7.14.0</version>
  <executions>
    <execution>

      <!-- We run the migration in the same phase, before jOOQ's
           code generation -->
      <phase>generate-sources</phase>
      <goals>
        <goal>migrate</goal>
      </goals>
      <configuration>

        <!-- This URL has been set by groovy, above -->
        <url>${db.url}</url>
        <user>${db.username}</user>
        <password>${db.password}</password>
        <locations>
          <location>
            filesystem:src/main/resources/db/migration
          </location>
        </locations>
      </configuration>
    </execution>
  </executions>
</plugin>

Add all the additional complexity of your migration in this configuration if you like. jOOQ wouldn’t know anything about it.

Now, generate the code

Again, nothing special here:

<plugin>
  <groupId>org.jooq</groupId>
  <artifactId>jooq-codegen-maven</artifactId>

  <executions>
    <execution>
      <id>java-generator</id>

      <!-- Same phase as above, but the previous plugins have already
           executed, so we're generating the db post migration -->
      <phase>generate-sources</phase>
      <goals>
        <goal>generate</goal>
      </goals>

      <configuration>
        <jdbc>

          <!-- Again, this URL has been set by groovy, above -->
          <url>${db.url}</url>
          <user>${db.username}</user>
          <password>${db.password}</password>
        </jdbc>
        <generator>
          <database>
            <inputSchema>public</inputSchema>
          </database>
          <target>
            <packageName>org.jooq.example.db</packageName>
          </target>
        </generator>
      </configuration>
    </execution>
  </executions>
</plugin>

And finally, optionally, integration test

If you want to re-use the above container with migrated database also in your integration tests, you could just pass along the generated JDBC URL to the maven-surefire-plugin as follows:

<plugin>
  <groupId>org.apache.maven.plugins</groupId>
  <artifactId>maven-surefire-plugin</artifactId>
  <configuration>
    <systemPropertyVariables>

      <!-- Again, this URL has been set by groovy, above -->
      <db.url>${db.url}</db.url>
      <db.username>${db.username}</db.username>
      <db.password>${db.password}</db.password>
    </systemPropertyVariables>
  </configuration>
</plugin>

There are many ways to achieve the same thing, this is one of them that works decently out of the box. You can check out a full example from github here, and play around with it:

https://github.com/jOOQ/jOOQ/tree/main/jOOQ-examples/jOOQ-testcontainers-flyway-example

More about testcontainers

To learn more about testcontainers, see our interview with Richard North here.

Using jOOQ to write vendor agnostic SQL with JPA’s native query or @Formula

If your legacy JPA application is using occasional native queries or Hibernate @Formula or Spring Data @Query annotation with vendor specific native SQL embedded in it, you can use jOOQ’s parsing connection and parsing data source to translate between dialects, without having to go all in on your jOOQ adoption – though I think it’s inevitable once you see what jOOQ can do for you.

Now, let’s design a table like this:

CREATE TABLE author (
  id INT NOT NULL,
  first_name TEXT,
  last_name TEXT NOT NULL,

  CONSTRAINT pk_author PRIMARY KEY (id)
);

Now, you may want to write a native query on this table using JPA’s EntityManager.createNativeQuery(), mapping that to entities. You could be using jOOQ’s DSL API for this, but let’s say you’re not ready for a migration to jOOQ yet, or you want to use actual SQL as provided by your DBA, not jOOQ’s DSL.

So, in MariaDB, you might be writing something like this:

List<Author> result =
em.createNativeQuery("""
    select a.id, nvl(a.first_name, 'N/A') as first_name, a.last_name
    from t_author as a
    order by a.id
    """, Author.class)
  .getResultList();

Where your entity is defined like this:

@Entity
@Table(name = "author")
public class Author {
    @Id
    public int id;

    @Column(name = "first_name")
    public String firstName;

    @Column(name = "last_name")
    public String lastName;

    // Constructors, getters, setters, equals, hashCode, etc
}

The above runs just fine and produces all the authors in MariaDB, which implemented support for Oracle’s NVL() function. But what about Oracle itself? The query fails on Oracle with:

ORA-00933: SQL command not properly ended

This is because in Oracle, you can’t use the AS keyword to alias tables, only to alias columns. Sure, you can remove that, but what about NVL()? You want this to work on MySQL and SQL Server, as well, but they complain:

MySQL

SQL Error [1305] [42000]: FUNCTION test.nvl does not exist

SQL Server

SQL Error [195] [S0010]: ‘nvl’ is not a recognized built-in function name.

Now, you have these options:

  • Use jOOQ to generate the SQL string for you, using the DSL
  • Use JPQL instead of a native query (but then heavily rewrite it, because JPQL is much less powerful than SQL)
  • Try your luck writing actual vendor agnostic SQL, manually
  • Or…

jOOQ’s parsing connection

You could use jOOQ’s parsing connection, which acts as a proxy to your actual connection, intercepting every SQL statement on the JDBC level in order to translate it to the target dialect.

This is as simple as wrapping your existing JDBC Connection or DataSource as follows:

DataSource originalDataSource = ...
DataSource parsingDataSource = DSL
    .using(originalDataSource, dialect)
    .parsingDataSource();

That’s it! I mean, you could pass some additional configuration Settings after the dialect, but that’s about as easy as it gets. The new DataSource can now run your SQL query on all the aforementioned dialects, e.g. you might see this in your DEBUG logs:

On MySQL:

-- org.hibernate.SQL
   select a.id, nvl(a.first_name, 'N/A') as first_name, a.last_name 
   from t_author as a 
   order by a.id
-- org.jooq.impl.ParsingConnection Translating from:
   select a.id, nvl(a.first_name, 'N/A') as first_name, a.last_name
   from t_author as a 
   order by a.id
-- org.jooq.impl.ParsingConnection Translation cache miss: 
   select a.id, nvl(a.first_name, 'N/A') as first_name, a.last_name
   from t_author as a 
   order by a.id
-- org.jooq.impl.ParsingConnection Translating to: 
   select a.id, ifnull(a.first_name, 'N/A') as first_name, a.last_name
   from t_author as a 
   order by a.id

On SQL Server:

-- org.hibernate.SQL
   select a.id, nvl(a.first_name, 'N/A') as first_name, a.last_name 
   from t_author as a 
   order by a.id
-- org.jooq.impl.ParsingConnection Translating from:
   select a.id, nvl(a.first_name, 'N/A') as first_name, a.last_name
   from t_author as a 
   order by a.id
-- org.jooq.impl.ParsingConnection Translation cache miss: 
   select a.id, nvl(a.first_name, 'N/A') as first_name, a.last_name
   from t_author as a 
   order by a.id
-- org.jooq.impl.ParsingConnection] Translating to: 
   select a.id, coalesce(a.first_name, 'N/A') first_name, a.last_name 
   from author a 
   order by a.id

Hibernate was tricked by jOOQ! The NVL function was translated to MySQL’s IFNULL or SQL Server COALESCE, and the AS keyword was removed from the SQL Server query. These are just simple examples, your actual SQL may be much more complex. Play around with the feature set online, here.

Also, the Settings.cacheParsingConnectionLRUCacheSize flag, which defaults to 8192, makes sure the same query doesn’t get re-translated all the time so you won’t spend too much time in jOOQ’s parser.

@Formula too, not just native queries

A quick win in Hibernate when you want to project additional values, akin to SQL’s own computed columns, which are available in many SQL dialects, is the @Formula annotation, which can be added to any entity, like so. Assuming this additional column:

ALTER TABLE author ADD year_of_birth INT;

We might have the following amended entity:

@Entity
@Table(name = "author")
public class Author {
    @Id
    public int id;

    @Column(name = "first_name")
    public String firstName;

    @Column(name = "last_name")
    public String lastName;

    @Column(name = "year_of_birth")
    public Integer yearOfBirth;

    @Formula("year_of_birth between 1981 and 1996")
    public Boolean millenial;

    // Constructors, getters, setters, equals, hashCode, etc
}

But unfortunately, there are still so many RDBMS that don’t actually support boolean types, and the @Formula annotation is purely static, and doesn’t allow for vendor-specific overrides. Are we going to rewrite that SQL query manually to make sure a SQL-92, vendor agnostic native SQL fragment is present that works on all dialects?

Or will we just plug in jOOQ’s parsing connection again? Let’s try the latter with:

Author author = em.find(Author.class, 1);

The MySQL log contains:

-- org.hibernate.SQL
   select 
     jpaauthorw0_.id as id1_4_0_,
     jpaauthorw0_.first_name as first_na2_4_0_, 
     jpaauthorw0_.last_name as last_nam3_4_0_, 
     jpaauthorw0_.year_of_birth between 1981 and 1996 as formula1_0_ 
   from author jpaauthorw0_ 
   where jpaauthorw0_.id=?
-- org.jooq.impl.ParsingConnection Translating from: [...]
-- org.jooq.impl.ParsingConnection Translation cache miss: [...]
-- org.jooq.impl.ParsingConnection Translating to: 
   select 
     jpaauthorw0_.id as id1_4_0_, 
     jpaauthorw0_.first_name as first_na2_4_0_, 
     jpaauthorw0_.last_name as last_nam3_4_0_, 
     jpaauthorw0_.year_of_birth between 1981 and 1996 as formula1_0_ 
   from author as jpaauthorw0_ 
   where jpaauthorw0_.id = ?

As you can see, jOOQ re-added the AS keyword to alias the on MySQL for good measure, because we like to be explicit about aliasing, and because that’s the default for Settings.renderOptionalAsKeywordForTableAliases

Whereas the SQL Server log contains:

-- org.hibernate.SQL 
   select 
     jpaauthorw0_.id as id1_4_0_, 
     jpaauthorw0_.first_name as first_na2_4_0_, 
     jpaauthorw0_.last_name as last_nam3_4_0_, 
     jpaauthorw0_.year_of_birth between 1981 and 1996 as formula1_0_ 
   from author jpaauthorw0_ 
   where jpaauthorw0_.id=?
-- org.jooq.impl.ParsingConnection Translating from: [...]
-- org.jooq.impl.ParsingConnection Translation cache miss: [...]
-- org.jooq.impl.ParsingConnection Translating to: 
   select
     jpaauthorw0_.id id1_4_0_, 
     jpaauthorw0_.first_name first_na2_4_0_, 
     jpaauthorw0_.last_name last_nam3_4_0_, 
     case 
       when jpaauthorw0_.year_of_birth between 1981 and 1996 
         then 1 
       when not (jpaauthorw0_.year_of_birth between 1981 and 1996) 
         then 0 
     end formula1_0_ 
   from author jpaauthorw0_ 
   where jpaauthorw0_.id = ?

A NULL-safe BOOLEAN type emulation (because if YEAR_OF_BIRTH is NULL (i.e. UNKNOWN), then MILLENIAL must be NULL, too (i.e. UNKNOWN))

Spring Data @Query annotation

Another case where native SQL can appear in JPA integrations is Spring Data JPA with its @Query annotation, especially when used with @Query(nativeQuery = true). Just like Hibernate’s @Formula, this annotation is compile-time static, with no way to override the value of the native query at runtime, short of maybe subtyping repositories per dialect.

But why go through all that hassle. It’s always the same thing. Just patch the DataSource with jOOQ’s parsing connection or parsing data source, and you’re set.

Conclusion

Even if you’re not using jOOQ’s DSL API, you can profit from jOOQ in many ways in your existing JDBC, R2DBC, JPA, MyBatis, etc. based applications by hooking in the jOOQ parsing connection and translating your vendor specific input dialect to any number of configurable output dialects.

If jOOQ’s parser can’t handle a feature, chances are, you can work around this limitation using the ParseListener SPI, e.g. when you want to support a hypothetical LOGICAL_XOR predicate (which is supported natively in MySQL):

Query query = configuration
    .derive(ParseListener.onParseCondition(ctx -> {
        if (ctx.parseFunctionNameIf("LOGICAL_XOR")) {
            ctx.parse('(');
            Condition c1 = ctx.parseCondition();
            ctx.parse(',');
            Condition c2 = ctx.parseCondition();
            ctx.parse(')');

            return CustomCondition.of(c -> {
                switch (c.family()) {
                    case MARIADB:
                    case MYSQL:
                        c.visit(condition("{0} xor {1}", c1, c2));
                        break;
                    default:
                        c.visit(c1.andNot(c2).or(c2.andNot(c1)));
                        break;
            });
        }

        // Let the parser take over if we don't know the token
        return null;
    })
    .dsl()
    .parser()
    .parseQuery(
        "select * from t where logical_xor(t.a = 1, t.b = 2)"
    );
  
System.out.println(DSL.using(SQLDialect.MYSQL).render(query));
System.out.println(DSL.using(SQLDialect.ORACLE).render(query));

The above program will print:

-- MYSQL:
select * 
from t
where (t.a = 1 xor t.b = 2);

-- ORACLE:
select * 
from t 
where (t.a = 1 and not (t.b = 2)) or (t.b = 2 and not (t.a = 1));

So, profit from using jOOQ to migrate your application’s vendor specific SQL off one RDBMS onto another, or to support multiple RDBMS products in a single application, with or without using jOOQ’s DSL!

Side note: Query transformation

This isn’t the topic of this blog post, but once you have jOOQ parse your every SQL statement, you can also use jOOQ to transform this SQL and tamper with the expression tree, for example by implementing client side row level security. The possibilities are endless!

Vendor Agnostic, Dynamic Procedural Logic with jOOQ

One of the strengths of modern RDBMS is the capability to mix the powerful SQL language with procedural code.

SQL is a 4th generation programming language (4GL), and as such, extremely well suited for querying and bulk data manipulation. Its functional-declarative nature allows for it to be optimised in highly efficient ways using cost-based optimisation, but also statically as we’ve blogged before.

Sometimes, however, an imperative 3GL is better suited for a given task. That’s where stored procedures shine, or more specifically, procedural languages of RDBMS.

Among the ones that jOOQ supports, at least these ones support procedures:

  • BigQuery
  • Db2
  • Exasol
  • Firebird
  • HANA
  • HSQLDB
  • Informix
  • MariaDB
  • MySQL
  • Oracle
  • PostgreSQL
  • SQL Server
  • Vertica

Others may do, as well, but jOOQ isn’t supporting their dialects yet.

Many have implemented their own procedural languages, some according to the ISO/IEC 9075-4 Persistent stored modules (SQL/PSM) standard, others have their own.

jOOQ support for procedural logic

Since jOOQ 3.12, our commercial distributions have supported anonymous blocks and the procedural statements they contain, such as the IF statement, LOOP statements, etc. Starting with jOOQ 3.15, we also support 3 types of statements to manage storing procedural logic in the catalog:

Using these statements via jOOQ may not be your every day use-case. You may prefer managing that logic via the native syntax, which is still more powerful than what jOOQ 3.15 supports (especially when you’re using Oracle’s PL/SQL), in case of which you’ll u se jOOQ purely to call your procedure from Java in the usual type safe manner.

But maybe, you have one of these use-cases?

  • You’re a product vendor, and you profit from procedural logic being vendor agnostic in order to support multiple of your clients’ RDBMS
  • Your procedural logic is dynamic, just like your SQL logic (and what other than jOOQ to use for that?)
  • You don’t have the necessary privileges to create procedures, functions, or triggers in your schema

In all of those cases, jOOQ is here to help you.

How does it work?

The first building block is the anonymous block, which isn’t supported by all of the above dialects, regrettably. jOOQ can emulate it on MySQL as discussed here, but not currently in other dialects.

Here’s a simple, empty anonymous block:

-- Db2
BEGIN
END

-- Firebird
EXECUTE BLOCK AS
BEGIN
END

-- MariaDB
BEGIN NOT ATOMIC
END;

-- Oracle
BEGIN
  NULL;
END;

-- PostgreSQL
DO $$
BEGIN
  NULL;
END;
$$

It doesn’t really do much, but you can try executing it as follows, with jOOQ:

ctx.begin().execute();

Now, let’s do something more interesting, such as:

// Assuming the usual static imports:
import static org.jooq.impl.DSL.*;
import static org.jooq.impl.SQLDataType.*;

// Then write
Variable<Integer> i = variable(unquotedName("i"), INTEGER);
Table<?> t = table(unquotedName("t"));
Field<Integer> col = field(unquotedName("col"), INTEGER);

ctx.begin(
    declare(i).set(1),

    while_(i.le(10)).loop(
        insertInto(t).columns(c).values(i),
        i.set(i.plus(1))
    )
).execute();

The above block executes:

-- Db2
BEGIN
  DECLARE i integer;
  SET i = 1;
  WHILE i <= 10 DO
    INSERT INTO t (c)
    VALUES (i);
    SET i = (i + 1);
  END WHILE;
END

-- FIREBIRD
EXECUTE BLOCK AS
  DECLARE i integer;
BEGIN
  :i = 1;
  WHILE (:i <= 10) DO BEGIN
    INSERT INTO t (c)
    VALUES (:i);
    :i = (:i + 1);
  END
END

-- MariaDB
BEGIN NOT ATOMIC
  DECLARE i int;
  SET i = 1;
  WHILE i <= 10 DO
    INSERT INTO t (c)
    VALUES (i);
    SET i = (i + 1);
  END WHILE;
END;

-- Oracle
DECLARE
  i number(10);
BEGIN
  i := 1;
  WHILE i <= 10 LOOP
    INSERT INTO t (c)
    VALUES (i);
    i := (i + 1);
  END LOOP;
END;

-- PostgreSQL
DO $$
DECLARE
  i int;
BEGIN
  i := 1;
  WHILE i <= 10 LOOP
    INSERT INTO t (c)
    VALUES (i);
    i := (i + 1);
  END LOOP;
END;
$$

-- SQL Server
BEGIN
  DECLARE @i int;
  SET @i = 1;
  WHILE @i <= 10 BEGIN
    INSERT INTO t (c)
    VALUES (@i);
    SET @i = (@i + 1);
  END;
END;

Easy as pie. Perhaps you prefer a FOR loop, instead? Try this:

ctx.begin(
    for_(i).in(1, 10).loop(
        insertInto(t).columns(c).values(i)
    )
).execute();

It produces the necessary emulations, if required, because regrettably, not all dialects support FOR:

-- Db2
BEGIN
  DECLARE i integer;
  SET i = 1;
  WHILE i <= 10 DO
    INSERT INTO t (c)
    VALUES (i);
    SET i = (i + 1);
  END WHILE;
END

-- Firebird
EXECUTE BLOCK AS
  DECLARE i integer;
BEGIN
  :i = 1;
  WHILE (:i <= 10) DO BEGIN
    INSERT INTO t (c)
    VALUES (:i);
    :i = (:i + 1);
  END
END

-- MariaDB
BEGIN NOT ATOMIC
  FOR i IN 1 .. 10 DO
    INSERT INTO t (c)
    VALUES (i);
  END FOR;
END;

-- Oracle
BEGIN
  FOR i IN 1 .. 10 LOOP
    INSERT INTO t (c)
    VALUES (i);
  END LOOP;
END;

-- PostgreSQL
DO $$
BEGIN
  FOR i IN 1 .. 10 LOOP
    INSERT INTO t (c)
    VALUES (i);
  END LOOP;
END;
$$

-- SQL Server
BEGIN
  DECLARE @i int;
  BEGIN
    SET @i = 1;
    WHILE @i <= 10 BEGIN
      INSERT INTO t (c)
      VALUES (@i);
      SET @i = (@i + 1);
    END;
  END;
END;

SQL vs procedures

Of course, this particular SQL statement would be better implemented using a single bulk insertion statement, purely with SQL, not with procedural logic

ctx.insertInto(t, c)
   .select(selectFrom(generateSeries(1, 10)))
   .execute();

Which translates to:

-- Db2
INSERT INTO t (c)
SELECT generate_series.generate_series
FROM (
  WITH
    generate_series(generate_series) AS (
      SELECT 1
      FROM SYSIBM.DUAL
      UNION ALL
      SELECT (generate_series + 1)
      FROM generate_series
      WHERE generate_series < 10
    )
  SELECT generate_series
  FROM generate_series
) generate_series;

-- Firebird
INSERT INTO t (c)
SELECT generate_series.generate_series
FROM (
  WITH RECURSIVE
    generate_series(generate_series) AS (
      SELECT 1
      FROM RDB$DATABASE
      UNION ALL
      SELECT (generate_series + 1)
      FROM generate_series
      WHERE generate_series < 10
    )
  SELECT generate_series
  FROM generate_series
) generate_series;

-- MariaDB
INSERT INTO t (c)
SELECT generate_series.generate_series
FROM (
  WITH RECURSIVE
    generate_series(generate_series) AS (
      SELECT 1
      UNION ALL
      SELECT (generate_series + 1)
      FROM generate_series
      WHERE generate_series < 10
    )
  SELECT generate_series
  FROM generate_series
) AS generate_series;

-- Oracle
INSERT INTO t (c)
SELECT generate_series.generate_series
FROM (
  SELECT (level + (1 - 1)) generate_series
  FROM DUAL
  CONNECT BY level <= ((10 + 1) - 1)
) generate_series;

-- PostgreSQL
INSERT INTO t (c)
SELECT generate_series.generate_series
FROM generate_series(1, 10);

-- SQL Server
WITH
  generate_series(generate_series) AS (
    SELECT 1
    UNION ALL
    SELECT (generate_series + 1)
    FROM generate_series
    WHERE generate_series < 10
  )
INSERT INTO t (c)
SELECT generate_series.generate_series
FROM (
  SELECT generate_series
  FROM generate_series
) generate_series

… but you get the point.

Storing the procedural logic

If you have the necessary privileges, and your procedural logic isn’t super dynamic, you may choose to store your logic in a procedure or function directly in your database. In some databases, this means a compiler will be able to eagerly translate the logic to something very efficient (e.g. machine code), instead of interpreting the logic on the fly.

Take the above WHILE loop, for example. You may want to store that as a procedure P:

Name p = unquotedName("p");

ctx.createProcedure(p)
   .modifiesSQLData()
   .as(
        declare(i).set(1),

        while_(i.le(10)).loop(
            insertInto(t).columns(c).values(i),
            i.set(i.plus(1))
        )
   )
   .execute();

This produces the following statements:

-- Db2
CREATE PROCEDURE p()
MODIFIES SQL DATA
BEGIN
  DECLARE i integer;
  SET i = 1;
  WHILE i <= 10 DO
    INSERT INTO t (c)
    VALUES (i);
    SET i = (i + 1);
  END WHILE;
END;

-- Firebird
CREATE PROCEDURE p()
AS
  DECLARE i integer;
BEGIN
  :i = 1;
  WHILE (:i <= 10) DO BEGIN
    INSERT INTO t (c)
    VALUES (:i);
    :i = (:i + 1);
  END
END

-- MariaDB
CREATE PROCEDURE p()
MODIFIES SQL DATA
BEGIN
  DECLARE i int;
  SET i = 1;
  WHILE i <= 10 DO
    INSERT INTO t (c)
    VALUES (i);
    SET i = (i + 1);
  END WHILE;
END;

-- Oracle
CREATE PROCEDURE p
AS
  i number(10);
BEGIN
  i := 1;
  WHILE i <= 10 LOOP
    INSERT INTO t (c)
    VALUES (i);
    i := (i + 1);
  END LOOP;
END;

-- PostgreSQL
CREATE PROCEDURE p()
LANGUAGE plpgsql
AS
$$
DECLARE
  i int;
BEGIN
  i := 1;
  WHILE i <= 10 LOOP
    INSERT INTO t (c)
    VALUES (i);
    i := (i + 1);
  END LOOP;
END;
$$

-- SQL Server
CREATE PROCEDURE p
AS
BEGIN
  DECLARE @i int;
  SET @i = 1;
  WHILE @i <= 10 BEGIN
    INSERT INTO t (c)
    VALUES (@i);
    SET @i = (@i + 1);
  END;
END;

And now, what better way to call this procedure than, again, an anonymous block?

ctx.begin(call(unquotedName("p"))).execute();

Producing:

-- Db2
BEGIN
  CALL p();
END

-- Firebird
EXECUTE BLOCK AS
BEGIN
  EXECUTE PROCEDURE p;
END

-- MariaDB
BEGIN NOT ATOMIC
  CALL p();
END;

-- Oracle
BEGIN
  p();
END;

-- PostgreSQL
DO $$
BEGIN
  CALL p();
END;
$$

-- SQL Server
BEGIN
  EXEC p ;
END;

If you’re using jOOQ in Flyway or Liquibase to generate procedures during your database migrations, you can obviously generate jOOQ procedure stubs to call in a more type safe manner, instead of the above dynamic procedure call.

Parsing procedural logic

This jOOQ feature is not really exceptional. You can play around with our parser / translator here: https://www.jooq.org/translate. It can definitely help you translate your (simpler) stored procedures between dialects, such as PL/SQL, T-SQL, PL/pgSQL, etc.

Conclusion

As a rule of thumb, if you can do it with SQL (the 4GL), do it with SQL alone. But sometimes, you can’t. A 3GL is a better choice for an algorithm. When using jOOQ, you’ll naturally think of using Java to implement that 3GL algorithm. But wait, you could move the logic to the server for (drastically) increased performance!

Thanks to jOOQ, you can generate procedural logic that is:

  • Dynamic
  • Vendor agnostic
  • Anonymous or stored

Just like you’re used to, from jOOQ, for SQL

MySQL’s allowMultiQueries flag with JDBC and jOOQ

MySQL’s JDBC connector has a security feature called allowMultiQueries, which defaults to false. When turned off, it prevents using a useful, but potentially dangerous feature in MySQL via JDBC:

try (Statement s = connection.createStatement()) {
    try {
        s.execute("create table t (i int);");

        // This doesn't work, by default:
        s.executeUpdate("""
            insert into t values (1);
            insert into t values (2);
        """);
    }
    finally {
        s.execute("drop table t");
    }
}

By default, the above produces a syntax error:

Exception in thread "main" java.sql.SQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'insert into t values (2)' at line 2
	at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:120)
	at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:122)
	at com.mysql.cj.jdbc.StatementImpl.executeUpdateInternal(StatementImpl.java:1333)
	at com.mysql.cj.jdbc.StatementImpl.executeLargeUpdate(StatementImpl.java:2106)
	at com.mysql.cj.jdbc.StatementImpl.executeUpdate(StatementImpl.java:1243)
	at org.jooq.testscripts.JDBC.main(JDBC.java:34)

We can’t chain statements like that, unless we turn on allowMultiQueries=true in the JDBC connection URL:

jdbc:mysql://localhost/test?allowMultiQueries=true

And now, suddenly, the statement batch completes normally, and two records are inserted into the table.

Why this feature?

The security feature helps prevent some SQL injection vulnerabilities. It is now much harder to append additional statements in case you have the bad idea of concatenating values to your string, such as:

// Terrible idea:
s.executeUpdate("insert into t values (" + value + ")");

Because, what if value contained the string "1); drop table t;"? It would be syntactically correct, so it would execute “as expected.” That wouldn’t be very nice.

Don’t get a false sense of security now. Turning off this capability will not prevent all SQL injection vulnerabilities. Just make this particular one harder. There are still various ways this particular lack of using a bind variable can lead to an attacker reading your data, e.g. through a time-based attack.

The risk of SQL injection needs to be taken seriously. The best thing is to always write static SQL with bind variables (e.g. PreparedStatement, stored procedures, or jOOQ), or a SQL builder like jOOQ for dynamic SQL.

Using allowMultiQueries in jOOQ

When using jOOQ, the above situation is very unlikely to happen. The default usage of jOOQ is using:

  • The code generator to generate database meta data
  • The DSL to generate SQL

Only in rare cases would you use plain SQL templating to work around a specific lack of functionality in jOOQ, and in that case, the templating language will help you avoid concatenating strings and running into SQL injection vulnerabilities.

If you’re of the careful sort, you can add an annotation processor to your build that prevents using the plain SQL API in jOOQ (any usage won’t compile by default, unless you explicitly opt in).

So, the MySQL flag isn’t really useful for your jOOQ usage. In fact, it’s even a problem, because jOOQ internally relies on generating statement batches as the above. Here are some features that don’t work correctly when you turn off allowMultiQueries=false (most of these also apply to MariaDB, btw):

GROUP_CONCAT

Whenever you use GROUP_CONCAT in jOOQ on MySQL, jOOQ assumes you haven’t already changed MySQL’s default value for @@group_concat_max_length. The default value is extremely low, namely 1024. And not only does that value prevent the string aggregation of larger data sets, it just fails silently, which produces wrong values!

When emulating JSON_ARRAYAGG() in MySQL using GROUP_CONCAT, there’s usually a detectable syntax error in the resulting JSON array, but this isn’t the case when you just want to produce some string values, e.g. a comma separated list. (See this previous blog why we don’t use the native JSON_ARRAYAGG() support yet).

So, what jOOQ does every time you use GROUP_CONCAT explicitly (or jOOQ uses it internally for some emulation), jOOQ will prepend and append the following statements:

-- These are prepended
SET @t = @@group_concat_max_len;
SET @@group_concat_max_len = 4294967295;

-- Actual statement here:
SELECT group_concat(A SEPARATOR ',') FROM T;

-- These are appended
SET @@group_concat_max_len = @t;

If you already fixed the system or session variable yourself, you can turn off this feature by changing the Settings.renderGroupConcatMaxLenSessionVariable flag.

CREATE OR REPLACE FUNCTION

Many SQL dialects have a CREATE OR REPLACE syntax for stored procedures, functions, triggers, and other stored objects that do not contain data. It’s very useful syntax sugar for writing this, instead:

-- Much simpler
CREATE OR REPLACE FUNCTION f ...

-- Than this
DROP FUNCTION IF EXISTS f;
CREATE FUNCTION f ...

But again, if you turn off allowMultiQueries=false, then this emulation in jOOQ won’t work and you get a syntax error again. There’s nothing jOOQ can do here for you. You’d have to manually run the two statements, instead of using the convenience syntax.

FOR UPDATE WAIT n

Many dialects have a FOR UPDATE WAIT n syntax that allows for specifying a WAIT timeout for pessimistic locks, e.g.

SELECT *
FROM t
FOR UPDATE WAIT n;

MySQL 8.0.26 doesn’t support this feature yet, but since jOOQ 3.15 and #11543, we’re emulating the above syntax using this:

SET @t = @@innodb_lock_wait_timeout;
SET @@innodb_lock_wait_timeout = 2;
SELECT *
FROM t
FOR UPDATE;
SET @@innodb_lock_wait_timeout = @t;

Another thing that wouldn’t work if you had allowMultiQueries=false

Anonymous blocks

Many procedural languages support anonymous blocks of procedural code, i.e. procedural code that is not stored in a procedure. It makes perfect sense. After all, we don’t have to store all of our SQL in views either, so why do we have to store our PL/SQL, T-SQL, PL/pgSQL, etc. in a stored procedure? This can be very useful especially when you want to generate these blocks dynamically, using jOOQ to run some logic on the server rather than the client, reducing round trips.

In Oracle, you can write:

BEGIN
  INSERT INTO t VALUES (1);

  IF TRUE THEN
    INSERT INTO t VALUES (2);
  END IF;
END;

jOOQ has started supporting such anonymous blocks since 3.12. Look at the manual page about the IF statement. You can write:

// Assuming the usual static imports:
import static org.jooq.impl.DSL.*;
import static org.jooq.impl.SQLDataType;

// Then write:
Variable<Integer> i = var("i", INTEGER);

ctx.begin(
  declare(i).set(1),

  if_(i.eq(0)).then(
    insertInto(A).columns(A.COL).values(1)
  ).elsif(i.eq(1)).then(
    insertInto(B).columns(B.COL).values(2)
  ).else_(
    insertInto(C).columns(C.COL).values(3)
  )
).execute();

This translates to and executes the correct procedural anonymous blocks in those dialects that support them, but MySQL 8.0.26 unfortunately doesn’t, yet, so what do we do? We generate an “anonymous” procedure, call it, and drop it again:

CREATE PROCEDURE block_1629705082441_2328258() 
BEGIN
DECLARE i INT; 
  SET i = 1; 

  IF i = 0 THEN
    INSERT INTO a (col) VALUES (1);
  ELSEIF i = 1 THEN
    INSERT INTO b (col) VALUES (2);
  ELSE
    INSERT INTO c (col) VALUES (3);
  END IF; 
END; 
CALL block_1629705082441_2328258(); 
DROP PROCEDURE block_1629705082441_2328258;

I mean, why not. But again, this relies on allowMultiQueries=true, otherwise, the JDBC driver will reject this statement.

For more info about the procedural language API in jOOQ, please refer to: https://blog.jooq.org/vendor-agnostic-dynamic-procedural-logic-with-jooq/

Conclusion

MySQL’s JDBC driver has a nice security feature that is intended to prevent some cases of SQL injection, especially when users use the JDBC connection for manual SQL execution. There’s always that one poor soul on the team that doens’t know about SQL injection yet, and thus gets it wrong, opening pandora’s box. For those usages, allowMultiQueries=false is a reasonable default.

When using jOOQ as jOOQ was intended to be used, SQL injection is much less likely. This excludes plain SQL templating usages, in case of which this article doesn’t apply, though. On the other hand, jOOQ internally relies on allowMultiQueries=true to enable a few emulations that require multiple statements to be executed in a single round trip.

A future jOOQ version will allow for configuring the execution model for multi queries, such that the above can be executed as multiple round trips. See #9645 for more details.

Until then, if you want to get the most out of jOOQ with MySQL, be sure to turn on allowMultiQueries=true on your jOOQ connection, possibly keeping it turned off elsewhere.

10 Things You Didn’t Know About jOOQ

jOOQ has been around for a while – since around 2009 as a publicly available library, and since 2013 as a commercially licensed product.

A lot of things have happened in 12 years. Here are 10 things that you maybe didn’t know about jOOQ.

1. eq, ne, gt, ge, lt, le are inspired by XSLT

What’s harder than naming a local variable?

Naming public API! The original jOOQ had methods like these to construct predicates:

But words such as greaterOrEqual() and others are kinda “heavy” in the middle of what is supposed to be a very readable, SQL style DSL, so a short version needed to be added. But what to name the short version? There are different opinions.

Since I love XSLT almost as much as I love SQL, it was natural to choose eq, ne, lt, le, gt, ge for these operators, just like in XSLT. Well, XPath, actually, to be precise. These are also available as HTML entities.

Fun fact, starting with jOOQ, we support also parsing these abbreviations in the SQL parser to support alternative Teradata syntax, see https://github.com/jOOQ/jOOQ/issues/11844 (yes, the almighty Teradata!)

-- Valid in Teradata, I mean, why not?
SELECT *
FROM t
WHERE id EQ 1

Other comparable libraries use eq, ne, lt, loe, gt, goe. But I could never sleep at night knowing the inconsistent length of names of these very similar operators.

2. API naming regrets

Some names, I wish I hadn’t chosen. The three most prominent ones are:

  • Field (it’s really more of a column or column expression. Field sounds so MS Excel-ish)
  • Condition (The SQL standard calls it predicate)
  • Record (the name is sound, but it now conflicts with java.lang.Record, which is a pain. If only I had called it Row)

Alas, there’s no way these terms are ever going to be changed without breaking every jOOQ application out there for no good reason. Such is life of an API developer.

3. Overloads in jOOQ’s API are really untagged union types

Oh, if only Java were more like TypeScript. They have these beautiful first-class untagged union types, which we poor Java folks only know from exception catch blocks, where they don’t exist as a first-class language feature, but just as syntactic sugar:

type F<T> = String | Name | Field<T> | Select<? extends Record1<T>>

If “just” we had those in Java… (along with the above type aliases). Then all the troubles of maintaining a vast API like jOOQ would be gone, namely the ever present set of overloaded methods.

Just look at the overloads for DSL::substring. Just look:

And that isn’t covering all possible permutations, by far. A pragmatic decision was made that it is not too likely for the first argument to be a string bind variable. We hardly ever support Name unless the argument is really about a column reference, not an arbitary column expression, and the scalar subquery case (Select<? extends Record1<T>>) well, that’s just convenience.

If users still need the bind variable, they can wrap it with DSL.val("value"). If Java did have untagged union types, however, the API would look more like this:

static Field<String> substring(
    F<String> string,
    F<? extends Number> startingPosition,
    F<? extends Number> length = null
) { ... }

Where F is the above union type. This would add a lot of convenience to a DSL like jOOQ because now, every permutation of argument types is supported. Alas, here we are, hand rolling the union type emulation through heavily overloaded methods, hoping IDE auto-completion doesn’t fail or become too slow (hello IntelliJ / Kotlin) πŸ˜…

4. Some SQL things are incredibly hard to abstract

Even after all those 12 years, there are still hard to solve bugs when trying to translate some SQL feature combinations to all 30 currently supported RDBMS.

I mean, try this on the almighty PostgreSQL, which isn’t even impressed by my lame attempts of creating not-everyday-SQL:

with t (a) as (
  with u (b) as (
    values (1)
  )
  select c
  from (
    with v (c) as (select b from u)
    select c from v
    union (
      select c from v
      union all
      select c from v
      order by c
    )
    order by c
  ) v
)
select a from t

It’s just an obfuscated way to write

values (1)

A few things here:

  • We’re nesting CTE in CTE declarations
  • We’re nesting CTE in derived tables
  • We’re nesting unions in derived tables
  • We’re nesting unions in union subqueries
  • We’re ordering derived tables
  • We’re ordering union subqueries

I mean, why not? But what about other dialects? Try converting this to alternative dialects on https://www.jooq.org/translate/, if you dare. Each one of those bullets (or several at a time), don’t work on some dialect. And not all of our translations work yet, for so many reasons.

These aren’t the most important bugs. They’re usually edge cases (e.g. the ORDER BY clauses are meaningless), but even then you want as much SQL to work on all of your dialects as possible, so we’re being kept busy, that’s for sure.

5. The pronunciation

Now it’s official (it always has been)

It’s pronounced dΚ’uːk (as in juke)

jOOQ is a recursive acronym that stands for jOOQ Object Oriented Querying. The “Object Oriented” stands for the API design, not how you’re supposed to use it. You’re supposed to use it in a functional programming style, duh. The jOOQ expression tree is following a composite pattern, if you will, and the SQL generation is implemented using a visitor pattern style approach, everything is encapsulated, etc. Just like you shouldn’t be tempted to say ess queue ell, you shouldn’t be tempted to say jay o o queue. It’s just dΚ’uːk for sequel. Hah!

6. RDBMS Bugs

jOOQ has helped discover a ton of RDBMS bugs maybe even more than the awesome https://github.com/sqlancer/sqlancer. When jOOQ integrates with a new dialect (e.g. EXASOL, recently), we discover a ton of bugs. See a list here: https://github.com/jOOQ/jOOQ/issues/1985, or for the recently supported Apache Ignite: https://github.com/jOOQ/jOOQ/issues/10551.

That’s because our integration tests are vast and cover all sorts of weird combinations of syntax that hardly anyone ever worries about such as the previous item 4. I always document each bug I find, either on the RDBMS issue tracker, if it’s public, or on Stack Overflow.

So, if you’re an RDBMS vendor and want us to test your SQL implementation, let us know! We’re for hire.

7. Mutability was a mistake

One of the biggest API design mistakes in jOOQ was mutability of the DSL, which can now hardly be removed. It’s even harder to change behaviour incompatibly than API. When API changes incompatibly, there are compilation errors. They’re a pain, but at least there aren’t any surprises.

Changing behaviour is a big no-go for a library. Here’s what I’m talking about:

SelectWhereStep<?> s =
ctx.select(T.A, T.B)
   .from(T);

// Dynamic SQL how you shouldn't do it:
if (something)
    s.where(T.C.eq(1));

if (somethingElse)
    s.where(T.D.eq(2));

Result<?> result = s.fetch();

Yes, the DSL API is mutable, which is why the above works. It shouldn’t work, and you shouldn’t use this, but here we are. Our own little sun.misc.Unsafe disaster. It’s too late. Everyone is using it already.

Not all DSL elements are mutable, for example, expressions are not:

Condition c = noCondition();

// Has no effect
if (something)
    c.and(T.C.eq(1));

if (somethingElse)
    c.and(T.D.eq(2));

The above doesn’t work. You’ll notice soon enough, and fix it accordingly:

Condition c = noCondition();

if (something)
    c = c.and(T.C.eq(1));

if (somethingElse)
    c = c.and(T.D.eq(2));

So, there’s still mutation, but only of your local variable, not any jOOQ objects. That’s how the entire DSL ought to work. Or even better, you could use a functional programming style to implement dynamic SQL.

In any case, it will be very hard to change this behaviour without breaking everything, in very subtle ways, because you can’t easily detect mutable API usage. In jOOQ 3.15, we’ve started annotating the DSL API with a @CheckReturnValue annotation, which is getting picked up by some tools and IDEs, see e.g. https://youtrack.jetbrains.com/issue/IDEA-265263.

Luckily, this annotation will also cause a warning when you use the DSL API in a mutable fashion, because you’re supposed to consume the return value of that where(T.C.eq(1)) call. Perhaps there is a way to change this, after all, though chances are slim. Probably not worth the damage caused.

Yeah, such is the fate of every “lightweight library”, once it reaches maturity. It’s almost impossible to change its fundamental flaws anymore

8. Source and behavioural compatibility is very important in jOOQ

Behavioural incompatibilities

Behavioural incompatibilities are an absolute no-go in almost all libraries / products.

Source incompatibilities

Source incompatibilities are sometimes inevitable, at least in major releases, if there is a very compelling reason. Since we haven’t released a major release in almost a decade, we treat our minor releases as major.

An example of such an incompatibility in jOOQ was when we removed the convenience overloads that accepted Condition|Field<Boolean>|Boolean. The three things are the same in jOOQ:

  • Condition is a SQL predicate (see item 2)
  • Field<Boolean> is a Condition wrapped as Field<Boolean>, which is quite nice in dialects with native support for the BOOLEAN type
  • Boolean is just a boolean bind variable, wrapped in DSL.val(boolean)

So, what’s wrong? The problem was the Boolean overload. It was intended for these kinds of usages:

// Turn off the entire query or subquery, dynamically
boolean featureFlag = ...;
.where(featureFlag)

So, rarely useful, only in edge cases. Again, what’s the problem? The problem was that users accidentally wrote this. All. The. Time.

.where(USER.NAME.equals(userName))

Can you spot the bug? It would not have happened if they had written USER.NAME.eq(userName), i.e. if they had used the XSLT style abbreviations.

Yes, they wrote equals() (as in Object::equals), not equal() (as in Field::equal). Simply typo. Still compiled. Chose the wrong overload. And looked almost correct. These would have been correct.

.where(USER.NAME.equal(userName))
.where(USER.NAME.eq(userName))

So, we deprecated, and then removed the overload, such that there is now a compilation error when using Object::equals. Seems a reasonable case for breaking source code, because that API’s usage was almost exclusively accidental.

How to test these things?

In order to make sure we don’t break behavioural or source compatibility, we have a ton (and I mean a ton) of tests. Behavioural compatibility is checked via unit and integration tests, making sure complex, and weird usages of jOOQ API continue to produce the same results on all of our supported RDBMS.

Source compatibility is checked via a ton of hand-written jOOQ API usage, which may be a bit more verbose than your average usage. For example, we don’t use var much in our own tests, though we heavily recommend you use it in your client code!

var result = ctx.select(T.A, multiset(..).as("fancy stuff")).fetch();

Instead, even in the most fancy statements using the new MULTISET operator, which can be quite heavy on the structural typing abuse of jOOQ, we always assign everything to explicitly typed variables that may look as threatening as this:

Result<Record4<
    String,                   // FILM.TITLE
    Result<Record2<
        String,               // ACTOR.FIRST_NAME
        String                // ACTOR.LAST_NAME
    >>,                       // "actors"
    Result<Record1<String>>,  // CATEGORY.NAME
    Result<Record4<
        String,               // CUSTOMER.FIRST_NAME
        String,               // CUSTOMER.LAST_NAME
        Result<Record2<
            LocalDateTime,    // PAYMENT.PAYMENT_DATE
            BigDecimal        // PAYMENT.AMOUNT
        >>, 
        BigDecimal            // "total"
    >>                        // "customers"
>> result = 
dsl.select(...)

Even if that type is not actually consumed or used in any way, e.g. when calling result.formatXML() on it. More tests = better. Any source incompatibility would immediately cause our tests to stop compiling, so we can be confident not to run into any surprises.

There are always weird things, nonetheless. It’s hard to achieve perfection. An example is this issue where rawtype compatibility was overlooked. I wonder if anyone actually makes sure their generics can be used safely and compatibly with raw types? Seems very hard in jOOQ’s case…

9. Binary compatibility is almost impossible to provide in jOOQ

But binary incompatibilities? In a DSL like jOOQ’s they’re probably completely impossible. An example is when we introduced support for the Teradata QUALIFY clause in jOOQ 3.12. Here’s a blog post explaining how our DSL works. Before supporting QUALIFY, the jOOQ WINDOW clause, supported via the SelectWindowStep had several window() method overloads like this:

// "extends SelectOrderByStep", because is window() methods are
// obviously optional. It's a hardly used feature
interface SelectWindowStep<R extends Record> 
extends SelectOrderByStep<R> {
    SelectOrderByStep<R> window(WindowDefinition... w);
}

Example usage:

select(T.A, count().over(w))
.from(T)
.window(w)
.orderBy(T.A)
.fetch();

Now, the QUALIFY clause comes after WINDOW (Teradata doesn’t support WINDOW, but if they did, they’d have to prepend it to QUALIFY, because WINDOW declares named window definitions, and QUALIFY consumes them, though you never know with SQL)

// "extends SelectQualifyStep" is a compatible change
interface SelectWindowStep<R extends Record> 
extends SelectQualifyStep<R> {

    // But these changes are not
    SelectQualifyStep<R> window(WindowDefinition... w);
}

While the JVM allows overloading by return type (and that feature is used to implement generics and covariant overloads since Java 5), the Java language does not allow this. There’s no way a new version of this API can maintain the old byte code, at least not in Java. Kotlin supports such things, i.e. the ability to emit (synthetic?) methods in byte code that cannot be called from source code directly, purely for backwards compatibility reasons.

So, if you’re upgrading from jOOQ 3.11 to 3.12, and you were using the WINDOW clause, tough luck. You have to recompile your client code, otherwise your friend NoSuchMethodError will have a word at runtime only (because at compile time, everything still compiles).

I guess that binary compatibility is not such a huge problem anymore in most cases. We run CI/CD jobs all the time, and recompile everything all the time for various reasons. It’s unlikely that you keep your own internal libraries binary compatible for other teams to consume. Nevertheless, it would be nice if it weren’t a problem. Ultimately, pragmatism dictates that we can’t offer this service in jOOQ (such as the JDK does), and try really hard not to break binary compatibility at least in patch releases, but minor releases do not have any such guarantee.

10. The Logo

The current logo has been designed when jOOQ went commercial

jOOQ is now jOOQβ„’

Fun fact, I designed both of these logos myself. My wife never liked the old one. It looked like some bloody samurai to her with all bloody eyes πŸ˜…

The new one went through several iterations. All weird and colourful. When finally, a friend who is working in design told me a few very simple tricks:

  • Black and white can be applied to all media (screens, print, etc.) very easily, compared to coloured logos.
  • It can even be inverted to white and black.
  • Squares are easier to manage in icons, headers, thumbnails, etc. than rectangles.

I took this a step further and made the entire logo quadratic, including the font, such that it doesn’t even require an SVG format to scale. It’s essentially a 20×20 bitmap. Can’t go more low budget yet effective than this! 😁

Formatting ASCII Charts With jOOQ

A very little known feature in jOOQ is the Formattable.formatChart() capability, which allows for formatting any jOOQ result as an ASCII chart. This can be useful for quick plotting of results in your console application.

Assuming you have a result set of this form (which is what you’re getting when you call result.format() or just result.toString())

+---+---+---+---+
|c  | v1| v2| v3|
+---+---+---+---+
|a  |  1|  2|  3|
|b  |  2|  1|  1|
|c  |  4|  0|  1|
+---+---+---+---+

This result can be produced by any query, or you can construct it without executing a query like this:

Field<String>  c  = field("c", VARCHAR);
Field<Integer> v1 = field("v1", INTEGER);
Field<Integer> v2 = field("v2", INTEGER);
Field<Integer> v3 = field("v3", INTEGER);

Result<Record4<String, Integer, Integer, Integer>> result = 
    create.newResult(c, v1, v2, v3); 
                      
result.add(create.newRecord(c, v1, v2, v3).values("a", 1, 2, 3));
result.add(create.newRecord(c, v1, v2, v3).values("b", 2, 1, 1));
result.add(create.newRecord(c, v1, v2, v3).values("c", 4, 0, 1));

Then, calling result.formatChart() will produce a nice ASCII chart like this, by default:

4.00|                                                  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
3.86|                                                  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
3.73|                                                  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
3.59|                                                  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
3.45|                                                  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
3.32|                                                  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
3.18|                                                  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
3.05|                                                  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
2.91|                                                  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
2.77|                                                  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
2.64|                                                  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
2.50|                                                  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
2.36|                                                  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
2.23|                                                  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
2.09|                                                  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
1.95|                         β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
1.82|                         β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
1.68|                         β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
1.55|                         β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
1.41|                         β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
1.27|                         β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
1.14|                         β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
1.00|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
----+---------------------------------------------------------------------------
    |            a                        b                        c            

It includes the first column as the category label on the x-axis, and the second column as the value to be plotted on the y-axis. You can tweak all sorts of configuration, including height and width:

result.formatChart(new ChartFormat().dimensions(6, 20));

To get this much smaller chart:

4.00|          β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
3.00|          β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
2.00|     β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
1.00|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
----+---------------
    |  a    b    c  

To include the values of the other value columns in a stacked chart (which is the default) write:

result.formatChart(new ChartFormat()
    .dimensions(40, 8)
    .values(1, 2, 3)
);

Producing:

6.00|β–’β–’β–’β–’β–’β–’β–’β–’β–’β–’β–’β–’                       
5.00|β–’β–’β–’β–’β–’β–’β–’β–’β–’β–’β–’β–’            β–’β–’β–’β–’β–’β–’β–’β–’β–’β–’β–’
4.00|β–’β–’β–’β–’β–’β–’β–’β–’β–’β–’β–’β–’β–’β–’β–’β–’β–’β–’β–’β–’β–’β–’β–’β–’β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
3.00|β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
2.00|β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–“β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
1.00|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
----+-----------------------------------
    |     a           b          c      

If those default ASCII characters from the sweet old 90s MS-DOS and BBS times don’t align well with your font (e.g. as on this blog), you can switch them like this:

result.formatChart(new ChartFormat()
    .dimensions(40, 8)
    .values(1, 2, 3)
    .shades('@', 'o', '.')
);

And now you’re getting:

6.00|............                       
5.00|............            ...........
4.00|........................@@@@@@@@@@@
3.00|oooooooooooooooooooooooo@@@@@@@@@@@
2.00|oooooooooooo@@@@@@@@@@@@@@@@@@@@@@@
1.00|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
----+-----------------------------------
    |     a           b          c      

Prefer displaying 100% charts? No problem

result.formatChart(new ChartFormat()
    .dimensions(40, 8)
    .values(1, 2, 3)
    .shades('@', 'o', '.')
    .display(Display.HUNDRED_PERCENT_STACKED)
);

And now, the result is:

100.00%|................................
 80.00%|......................@@@@@@@@@@
 60.00%|...........ooooooooooo@@@@@@@@@@
 40.00%|ooooooooooo@@@@@@@@@@@@@@@@@@@@@
 20.00%|ooooooooooo@@@@@@@@@@@@@@@@@@@@@
  0.00%|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
-------+--------------------------------
       |     a          b         c     

Who needs MS Excel? No one, if you have jOOQ!

Standard SQL/JSON – The Sobering Parts

It’s been almost 1 year now since jOOQ 3.14 was released in October 19, 2020 with SQL/JSON (and SQL/XML) support. Half a year later, we’ve released jOOQ 3.15 with MULTISET support, which builds on top of these features to offer type-safe nested collections, the way every ORDBMS should implement them.

Building (dogfooding) on top of our own SQL/JSON API has revealed a lot of caveats of the various SQL/JSON implementations across vendors, and to be frank, it’s been a bit of a sobering experience. Despite there now being the ISO/IEC TR 19075:6 standard (mostly driven by Oracle this time), many vendors have already implemented some kind of JSON support, and it looks differently in all dialects – to the extent where writing vendor agnostic SQL/JSON is almost impossible with hand written native SQL. You’ll need an API like jOOQ or any other abstraction to standardise the different dialects.

In this article, I’d like to share a few of the biggest caveats encountered over the past year. There are many more, just try to translate some standard SQL/JSON to various dialects with jOOQ.

JSON Type or String?

JSON documents can be seen as simple strings with formatted content. Sure, I mean you can parse a string into a JSON document, so why complicate the SQL type system with new types?

Unfortunately, this proves to be a poor decision by most vendors. PostgreSQL got this right by offering the JSON and JSONB types, the latter being quite performant even when storing and indexing JSON documents.

First off, and this comes to no surprise to anyone working with almost any other language than SQL: Types are semantic. Stringly typed environments are bad. Or as the quote goes:

The Bitterness of Poor Quality Remains Long After the Sweetness of Low Price is Forgotten — Benjamin Franklin

With “Low Price” being the low price of the quick and dirty stringly-typed feature addition without a formal type. We already had this time and again with BOOLEAN types, and dialects like MySQL that pretend that supporting this is a good idea:

SELECT *
FROM t
WHERE 1 OR 0 -- TRUE OR FALSE

Let’s look at examples of what I mean.

MariaDB and MySQL

Let’s look at MariaDB’s and MySQL’s syntax first. Here’s how to create a JSON array:

select json_array(1, 2)

Producing

[1, 2]

That’s great! And it even conforms to standard SQL syntax. A JSON array can also be nested easily:

select json_array(json_array(1, 2), 3)

To produce:

[[1, 2], 3]

Now, what if that nested array originates from a derived table?

select json_array(nested, 3)
from (select json_array(1, 2) nested) as t

It results in:

-- MySQL, MariaDB 10.6
[[1, 2], 3]

-- MariaDB 10.5
["[1, 2]", 3]

Bummer. In MariaDB 10.5, the nested array lost its “JSON type annotation” and went back to the stringly typed version of itself. As a string, it needs to be quoted. This seems to be a bug https://jira.mariadb.org/browse/MDEV-26134 which apparently has been fixed already under a different issue in MariaDB 10.6. It’s not the only one, though. The bug tracker is full of similar issues: https://jira.mariadb.org/browse/MDEV-13701.

MySQL seems to fare a bit better by now, though there are caveats when aggregating (see further down).

The only workaround I’ve found so far for the above bugs is extremely laborious:

select json_array(json_merge_preserve('[]', nested), 3)
from (select json_array(1, 2) nested) as t

Imagine doing that every time you nest JSON. It works, but it really needs to be automated (e.g. through jOOQ).

Oracle

Oracle defined most of the SQL standard, and I really like how SQL-idiomatic its syntax feels, just like SQL/XML. Unfortunately, they waited with the introduction of the new JSON type until Oracle 21c (can’t wait to play with it). As such, we have to choose whether to represent JSON as VARCHAR2, by default (limited to 4000 bytes on some systems or 32kb at most!) or CLOB. If you’re doing serious JSON, you can probably always add the RETURNING CLOB clause to every single JSON function call of yours. Effectively:

select json_array(1, 2 returning clob)
from dual

The nested version looks like this:

select json_array(nested, 3 returning clob)
from (select json_array(1, 2 returning clob) nested from dual) t;

This works much better in Oracle, but there are still many edge cases that it doesn’t get right. Try this on Oracle 18c:

select json_arrayagg(coalesce(json_object(), json_object()))
from dual

And it produces another instance of erroneously stringly-typed JSON:

["{}"]

The workaround is to add FORMAT JSON just about everywhere, just to be sure, e.g.

select json_arrayagg(
  coalesce(json_object(), json_object()) format json
)
from dual

Now, the result is as expected:

[{}]

To play it safe, you’ll probably have to write FORMAT JSON just about everywhere again, like RETURNING CLOB

Truncation

Truncation is probably the most sobering part when using JSON. Why do we still have size-limited data types in SQL? It hardly every makes sense from an application perspective. Yet, here we are.

Oracle

Run this query in Oracle:

select json_arrayagg(owner) from all_objects;

And you’ll get:

SQL Error [40478] [99999]: ORA-40478: output value too large (maximum: 4000)

Increasing the maximum VARCHAR2 size to 32kb will only postpone the problem. There is no “reasonable” size limit to such documents, so again, you’ll have to RETURNING CLOB all the time.

select json_arrayagg(owner returning clob) from all_objects;

The price is the same as always. CLOB are just a bit more annoying to work with than VARCHAR2, both within Oracle as well as in the client application (e.g. JDBC based), because you have to work with yet another resource when you just wanted a string. jOOQ will just add the clause everywhere for you, there’s hardly a reason not to do that with jOOQ, as the fetching of CLOB values is completely transparent to jOOQ users.

MySQL

Up until recently, you couldn’t call JSON_ARRAYAGG() in MySQL, and the MariaDB version crashed the server (https://jira.mariadb.org/browse/MDEV-21912). When this blog was written, neither implementation supported the ORDER BY clause in JSON_ARRAYAGG(), which I find quite essential, so the workaround is to use GROUP_CONCAT:

select concat('[', group_concat(id), ']')
from t_book

Of course, that is very wrong if the concatenated values are not numbers, so we need to also use JSON_QUOTE, e.g.

select concat('[', group_concat(json_quote(title)), ']')
from t_book

And, if you’re embedding that stuff in other JSON structures, you have to turn what really is a string now, back into JSON using JSON_MERGE (until recently, but now deprecated), or JSON_MERGE_PRESERVE, e.g.

select json_object(
  'titles', 
  json_merge_preserve(
    '[]', 
    concat('[', group_concat(json_quote(title)), ']')
  )
)
from t_book

To produce a document like this:

{"titles": ["1984", "Animal Farm", "O Alquimista", "Brida"]}

Without that JSON_MERGE_PRESERVE, you’d be getting:

{"titles": "[\"1984\",\"Animal Farm\",\"O Alquimista\",\"Brida\"]"}

Definitely not something you can ever remember.

Anyway. This section was about truncation! What happens with large, aggregated JSON documents in MySQL? Try this:

select 
  concat('[', group_concat(json_quote(table_name)), ']')
from information_schema.tables

It produces (on my machine):

["innodb_table_stats","innodb_index_stats","CHARACTER_SETS","CHECK_CONSTRAINTS","COLLATIONS","COLLATION_CHARACTER_SET_APPLICABILITY","COLUMNS","COLUMNS_EXTENSIONS","COLUMN_STATISTICS","EVENTS","FILES","INNODB_DATAFILES","INNODB_FOREIGN","INNODB_FOREIGN_COLS","INNODB_FIELDS","INNODB_TABLESPACES_BRIEF","KEY_COLUMN_USAGE","KEYWORDS","PARAMETERS","PARTITIONS","REFERENTIAL_CONSTRAINTS","RESOURCE_GROUPS","ROUTINES","SCHEMATA","SCHEMATA_EXTENSIONS","ST_SPATIAL_REFERENCE_SYSTEMS","ST_UNITS_OF_MEASURE","ST_GEOMETRY_COLUMNS","STATISTICS","TABLE_CONSTRAINTS","TABLE_CONSTRAINTS_EXTENSIONS","TABLES","TABLES_EXTENSIONS","TABLESPACES_EXTENSIONS","TRIGGERS","VIEW_ROUTINE_USAGE","VIEW_TABLE_USAGE","VIEWS","COLUMN_PRIVILEGES","ENGINES","OPTIMIZER_TRACE","PLUGINS","PROCESSLIST","PROFILING","SCHEMA_PRIVILEGES","TABLESPACES","TABLE_PRIVILEGES","USER_PRIVILEGES","cond_instances","error_log","events_waits_current","events_waits_history","events_waits_history_long","events_waits_summary_by_host_by_event_name","events_waits_summary_by]

Wait, what is that at the end

,"events_waits_summary_by]

Invalid JSON. Because the GROUP_CONCAT string got truncated. We could set the following

set @@group_concat_max_len = 4294967295;

And now the output is correct, and much longer:

["innodb_table_stats",...,"t_identity_pk","t_triggers"]

An API like jOOQ will be able to set this session variable automatically for you, but you probably don’t want to think about this all the time for native SQL?

Data Type Support

JSON knows a few data types. Namely:

  • string
  • number
  • boolean
  • null
  • object
  • array

That’s less than SQL, but often good enough (anything can be encoded as a string, after all). But when you don’t have BOOLEAN types in SQL (e.g. MySQL, Oracle), then you have to manually encode your already manual encoding to JSON BOOLEAN.

MySQL

MySQL makes you believe that this works:

select json_array(true, false)

Because the above produces

[true, false]

But it doesn’t really work. That seems to be hard coded in the parser. As soon as your true and false values are expressions instead of literals, e.g. originating from a derived table:

select json_array(t, f)
from (select true as t, false as f) as t

You’ll get:

[1, 0]

There are different ways to emulate this. One being:

select json_array(
  json_extract(case when t = 1 then 'true' when t = 0 then 'false' end, '$'),
  json_extract(case when f = 1 then 'true' when f = 0 then 'false' end, '$')
)
from (select true as t, false as f) as t

Now, we’re getting again:

[true, false]

Oracle

Unlike MySQL / MariaDB, Oracle SQL doesn’t pretend it has a BOOLEAN type. Instead, people encode it as a NUMBER(1) or CHAR(1) or some other thing. Irrespective of the encoding, this is the solution:

select json_array(
  case when t = 1 then 'true' when t = 0 then 'false' end format json,
  case when f = 1 then 'true' when f = 0 then 'false' end format json
)
from (
  select 1 as t, 0 as f, null as n
  from dual
) t

Producing

[true,false]

NULL handling

When using NULL with SQL/JSON, there are various caveats. First off, SQL NULL is not the same thing as JSON NULL. Using PostgreSQL:

select 
  a, 
  b, 
  a = b as equal, 
  a is null as a_is_null, 
  b is null as b_is_null
from (
  select null::jsonb as a, 'null'::jsonb as b
) as t

Produces:

|a  |b   |equal|a_is_null|b_is_null|
|---|----|-----|---------|---------|
|   |null|     |true     |false    |

Where the empty cells are SQL NULL values, and the null value is a JSON null value that is not “SQL NULL“. It’s the only reasonable thing to do, really. Do dialects agree?

MySQL

select 
  a, 
  b,
  a = b as equal, 
  a is null as a_is_null, 
  b is null as b_is_null
from (
  select null as a, json_extract('null', '$') as b
) as t

Producing also:

|a  |b   |equal|a_is_null|b_is_null|
|---|----|-----|---------|---------|
|   |null|     |1        |0        |

So, yes!

Oracle

Let’s see, this could be how I can create a JSON NULL value:

select 
  a, 
  b, 
  case when a = b then 1 else 0 end as equal, 
  case when a is null then 1 else 0 end as a_is_null, 
  case when b is null then 1 else 0 end as b_is_null
from (
  select null as a, json_value('[null]', '$[0]') as b
  from dual
) t

Yet, Oracle is known to have a funky relationship with NULL strings. This is the result:

|A  |B  |EQUAL|A_IS_NULL|B_IS_NULL|
|---|---|-----|---------|---------|
|   |   |0    |1        |1        |

There does not seem to be an actual JSON NULL representation! I couldn’t find a workaround for this yet. Probably I will. But this is very unfortunate, leading to many conversion edge cases.

On the other hand, Oracle is the only dialect among the ones presented in this article that introduced a very useful NULL handling clause for aggregate functions. Check this out:

select
  json_arrayagg(a),
  json_arrayagg(a absent on null),
  json_arrayagg(a null on null)
from (
  select 1 as a from dual union all
  select null from dual
) t

Producing

|A  |B  |C       |
|---|---|--------|
|[1]|[1]|[2,null]|

Note the above query doesn’t produce the correct result on Oracle 18c yet, because of a parser / optimiser bug. Use this to work around the bug for this example:

select
  json_arrayagg(a) a,
  json_arrayagg(a + 0 absent on null) b,
  json_arrayagg(a - 0 null on null) c
from (
  select 1 as a from dual union all
  select null from dual
) t

Other dialects have different opinions on how to aggregate NULL values to JSON documents. In SQL, aggregate functions tend to ignore NULL values, such as Oracle above, but with JSON, it is often essential to include the value, especially when creating a JSON_OBJECT, where an absent key is not strictly the same thing as an absent value.

Db2

Db2 has a very limited implementation of SQL/JSON. It is very standards compliant in terms of syntax, but suffers from severe bugs like these:

Which make it quite unusable for now. Surely, this will improve in the near future.

SQL Server

I’ve also left out SQL Server in this article. SQL Server had JSON and XML support for a while, but implemented it completely differently. You can’t just form arbitrary JSON objects or arrays easily, but you can transform result sets into JSON in an intuitive way.

This offers a quick win when streaming results, but doesn’t compose very well. E.g. you can’t create a JSON_ARRAY with scalar, non-object content, though a JSON_OBJECT can be created like this:

select 1 as a, 2 as b
for json path, without_array_wrapper

Producing

{"a":1,"b":2}

A limited number of JSON features can be emulated also in SQL Server, and as with all the dialects that don’t support an actual JSON type, constant escaping / unescaping may be required.

Conclusion

SQL/JSON was standardised relatively late, mostly by Oracle. The standard is very sound. But regrettably, a lot of dialects disagree on syntax, behaviour, and some are still quite buggy.

The soundest of all implementations is that of PostgreSQL, which introduced proper data types and offers a rich set of vendor-specific functions to manipulate JSON directly in SQL. In the near future, PostgreSQL will embrace standard SQL/JSON, just like Oracle, and enhance its own implementation. I don’t think there will be anything fundamentally new, just more portable standard syntax.

See this talk for a preview of “what’s next”:

There is nothing insurmountable for a library like jOOQ, which abstracts over all the dialects and offers the SQL/JSON (and SQL/XML) functionality using a uniform, standards-inspired API. As dialects improve, jOOQ will adopt new, better syntax by individual vendors, so you can implement your SQL/JSON queries forwards compatibly.

So, as always, use jOOQ to work around the subtle and boring differences of the various vendor syntaxes and start using SQL/JSON already today. Despite the caveats presented here, SQL/JSON is one of the most exciting things about modern SQL!

Read more:

Ad-hoc Data Type Conversion with jOOQ 3.15

jOOQ 3.15 shipped with a ton of new features, the most important ones being:

A very useful, lesser known new feature is “ad-hoc data type conversion”. Data type converters and bindings have been around in jOOQ for a long time. Their goal is to allow for using custom data types for common JDBC types like String or Integer. So, if you have a table like this:

CREATE TABLE furniture (
  id INT NOT NULL PRIMARY KEY,
  name TEXT NOT NULL,
  length NUMERIC,
  width NUMERIC,
  height NUMERIC
);

Instead of using BigDecimal for those dimensions, you may have preferred a custom, more semantic wrapper type for numbers, such as:

record Dimension(BigDecimal value) {}

And your Java representation of Furniture would be:

record Furniture(
  Integer id,
  String name,
  Dimension length,
  Dimension width,
  Dimension height
) {}

You’d go and attach a converter to your code generator, e.g.

<configuration>
  <generator>                            
    <database>
      <forcedTypes>
        <forcedType>
          <userType>com.example.Dimension</userType>
          <converter><![CDATA[
          org.jooq.Converter.ofNullable(
            BigDecimal.class,
            Dimension.class,
            Dimension::new,
            Dimension::value
          )
          ]]></converter>
          <includeExpression>LENGTH|WIDTH|HEIGHT</includeExpression>
        </forcedType>
      </forcedTypes>
    </database>
  </generator>
</configuration>

That would allow you to query your database like this:

Result<Record3<Dimension, Dimension, Dimension>> result =
ctx.select(FURNITURE.LENGTH, FURNITURE.WIDTH, FURNITURE.HEIGHT)
   .from(FURNITURE)
   .fetch();

But sometimes, you can’t leverage code generation:

  • You can’t access the code generator configuration for some reason
  • You don’t want to attach a converter to your columns for every query
  • You’re not using the code generator because you have a dynamic schema known only at runtime

Enter Ad-hoc Converters

Starting from jOOQ 3.15, we support various ways of registering a convenient ad-hoc converter to your Field<T> expression. This feature was mainly introduced to allow for mapping MULTISET nested collections to lists of a custom data type (a feature we urge you to try out, you won’t look back!)

But you can use the feature also for any other Field expression. Assuming you can’t use code generation for the above query (the main reason, again, being your schema being dynamic). You would probably write something like this:

Table<?> furniture = table(name("furniture"));
Field<BigDecimal> length = field(name("furniture", "length"), NUMERIC);
Field<BigDecimal> width  = field(name("furniture", "width"), NUMERIC);
Field<BigDecimal> height = field(name("furniture", "height"), NUMERIC);

Result<Record3<BigDecimal, BigDecimal, BigDecimal>> result =
ctx.select(length, width, height)
   .from(furniture)
   .fetch();

As always, the usual static imports are implied:

import static org.jooq.impl.DSL.*;
import static org.jooq.impl.SQLDataType.*;

But code generation is ultimately just convenience. You can always achieve everything you can with jOOQ’s code generator also without it (though I do recommend you use code generation if possible!). So, in order to re-use our Dimension data type, historically, you could do this:

DataType<Dimension> type = NUMERIC.asConvertedDataType(
    Converter.ofNullable(
        BigDecimal.class,
        Dimension.class,
        Dimension::new,
        Dimension::value
    )    
);

Table<?> furniture = table(name("furniture"));
Field<Dimension> length = field(name("furniture", "length"), type);
Field<Dimension> width  = field(name("furniture", "width"), type);
Field<Dimension> height = field(name("furniture", "height"), type);

Result<Record3<Dimension, Dimension, Dimension>> result =
ctx.select(length, width, height)
   .from(furniture)
   .fetch();

That’s already very neat. But again, you’re going to create a Field reference that always uses this converter. Maybe, you wanted conversion to apply just for this one query? No problem with ad-hoc converters! Write this:

Table<?> furniture = table(name("furniture"));
Field<BigDecimal> length = field(name("furniture", "length"), NUMERIC);
Field<BigDecimal> width  = field(name("furniture", "width"), NUMERIC);
Field<BigDecimal> height = field(name("furniture", "height"), NUMERIC);

Result<Record3<BigDecimal, BigDecimal, Dimension>> result =
ctx.select(length, width, height.convertFrom(Dimension::new))
   // ad-hoc conversion here:    ^^^^^^^^^^^^^^^^^^^^^^^^^^^
   .from(furniture)
   .fetch();

There are various overloads of Field.convert(), the most powerful one being the ones that accept a complete Binding or Converter reference. The above one is very convenient, as it allows you to provide only the “from” Function<T, U> of a converter, omitting the Class<T>, Class<U>, and “to” Function<U, T>.

What is a Converter?

What is a Converter after all? It is an implementation for this:

public interface Converter<T, U> {
    U from(T databaseObject);
    T to(U userObject);
    Class<T> fromType();
    Class<U> toType();
}

Where:

  • T is the “JDBC type”, i.e. a technical type understood by the JDBC API, such as String or BigDecimal
  • U is the “user type”, i.e. a semantic type that you choose to represent data in your client application
  • Class<T> is a class literal for T, required for reflection purposes, e.g. to create an array T[] at runtime
  • Class<U> is a class literal for U, required for reflection purposes, e.g. to create an array U[] at runtime

When attaching a Converter to the code generator, it is always good to provide all of the above. The two conversion functions converting between T and U, as well as the class literals. You never know if jOOQ needs them for some specific operation.

But in the case of ad-hoc conversion, you usually only need one of the from (read) or to (write) functions. Why repeat all of the rest? Hence, these options:

// A "read-only" field converting from BigDecimal to Dimension
height.convertFrom(Dimension::new);

// Like above, but with an explicit class literal, if needed
height.convertFrom(Dimension.class, Dimension::new);

// A "write-only" field converting from Dimension to BigDecimal
height.convertTo(Dimension::value);

// Like above, but with an explicit class literal, if needed
height.convertTo(Dimension.class, Dimension::value);

// Full read/write converter support
height.convert(Dimension.class, Dimension::new, Dimension::value);
height.convert(Converter.ofNullable(
    BigDecimal.class,
    Dimension.class, 
    Dimension::new, 
    Dimension::value
));

What’s the difference between “read-only” and “write-only” conversions? Simple. Look at these queries:

Result<Record1<Dimension>> result =
ctx.select(height.convertFrom(Dimension::new))
   .from(furniture)
   .fetch();

ctx.insertInto(furniture)
   .columns(height.convertTo(Dimension::value))
   .values(new Dimension(BigDecimal.ONE))
   .execute();

So, in summary:

  • The read-only ad-hoc converter is useful in projections (SELECT)
  • The write-only ad-hoc converter is useful in predicates (WHERE), or DML