A Rarely Seen, but Useful SQL Feature: CORRESPONDING

I recently stumbled upon a standard SQL feature that was implemented, to my surprise, in HSQLDB. The keyword is CORRESPONDING, and it can be used with all set operations, including UNION, INTERSECT, and EXCEPT.

Let’s look at the sakila database. It has 3 tables with people in it:

CREATE TABLE actor (
    actor_id integer NOT NULL PRIMARY KEY,
    first_name varchar(45) NOT NULL,
    last_name varchar(45) NOT NULL,
    last_update timestamp
);

CREATE TABLE customer (
    customer_id integer NOT NULL PRIMARY KEY,
    store_id smallint NOT NULL,
    first_name varchar(45) NOT NULL,
    last_name varchar(45) NOT NULL,
    email varchar(50),
    address_id smallint NOT NULL,
    create_date date NOT NULL,
    last_update timestamp,
    active boolean
);

CREATE TABLE staff (
    staff_id integer NOT NULL PRIMARY KEY,
    first_name varchar(45) NOT NULL,
    last_name varchar(45) NOT NULL,
    address_id smallint NOT NULL,
    email varchar(50),
    store_id smallint NOT NULL,
    active boolean NOT NULL,
    username varchar(16) NOT NULL,
    password varchar(40),
    last_update timestamp,
    picture blob
);

Similar, but not the same. What if we wanted to get all the “people” from our database? One way to do that in any ordinary database product is:

SELECT first_name, last_name
FROM actor
UNION ALL
SELECT first_name, last_name
FROM customer
UNION ALL
SELECT first_name, last_name
FROM staff
ORDER BY first_name, last_name

The result might look like this:

|first_name|last_name|
|----------|---------|
|AARON     |SELBY    |
|ADAM      |GOOCH    |
|ADAM      |GRANT    |
|ADAM      |HOPPER   |
|ADRIAN    |CLARY    |
|AGNES     |BISHOP   |
|AL        |GARLAND  |
|ALAN      |DREYFUSS |
|...       |...      |

Using CORRESPONDING

Now, in HSQLDB, and in standard SQL, you can use CORRESPONDING for this kind of task. For example:

SELECT *
FROM actor
UNION ALL CORRESPONDING
SELECT *
FROM customer
UNION ALL CORRESPONDING
SELECT *
FROM staff
ORDER BY first_name, last_name

The result is this:

|first_name|last_name|last_update            |
|----------|---------|-----------------------|
|AARON     |SELBY    |2006-02-15 04:57:20.000|
|ADAM      |GOOCH    |2006-02-15 04:57:20.000|
|ADAM      |GRANT    |2006-02-15 04:34:33.000|
|ADAM      |HOPPER   |2006-02-15 04:34:33.000|
|ADRIAN    |CLARY    |2006-02-15 04:57:20.000|
|AGNES     |BISHOP   |2006-02-15 04:57:20.000|
|AL        |GARLAND  |2006-02-15 04:34:33.000|
|ALAN      |DREYFUSS |2006-02-15 04:34:33.000|
|...       |...      |...                    |

So, what has happened? The columns FIRST_NAME, LAST_NAME, and LAST_UPDATE are common to all three tables. In other words, if you run this query against the INFORMATION_SCHEMA in HSQLDB:

SELECT column_name
FROM information_schema.columns
WHERE table_name = 'ACTOR'
INTERSECT
SELECT column_name
FROM information_schema.columns
WHERE table_name = 'CUSTOMER'
INTERSECT
SELECT column_name
FROM information_schema.columns
WHERE table_name = 'STAFF'

You get exactly these 3 columns:

|COLUMN_NAME|
|-----------|
|FIRST_NAME |
|LAST_NAME  |
|LAST_UPDATE|

In other words, CORRESPONDING creates the intersection of columns among the subqueries of a set operation (i.e. the “shared columns”), projects those, and applies the set operation that projection. In a way, this is similar to a NATURAL JOIN, which also tries to find that intersection of columns to produce a join predicate. However, NATURAL JOIN then projects all of the columns (or the union of the columns), not just the shared ones.

Using CORRESPONDING BY

Just like NATURAL JOIN, this is a risky operation. As soon as one subquery changes its projection (e.g. because of a table column rename), the result of all such queries will change as well, and it might not even produce a syntax error, just a different result.

In fact, in the above example, we probably didn’t even care about that LAST_UPDATE column. It was included in the UNION ALL set operation by accident, just like NATURAL JOIN would join using LAST_UPDATE by accident.

With joins, we can use JOIN .. USING (first_name, last_name) to at least specify by which shared column names we want to join the two tables. With CORRESPONDING, we can supply the optional BY clause for the same purpose:

SELECT *
FROM actor
UNION ALL CORRESPONDING BY (first_name, last_name)
SELECT *
FROM customer
UNION ALL CORRESPONDING BY (first_name, last_name)
SELECT *
FROM staff
ORDER BY first_name, last_name;

This now produces only the two desired columns:

|first_name|last_name|
|----------|---------|
|AARON     |SELBY    |
|ADAM      |GOOCH    |
|ADAM      |GRANT    |
|ADAM      |HOPPER   |
|ADRIAN    |CLARY    |
|AGNES     |BISHOP   |
|AL        |GARLAND  |
|ALAN      |DREYFUSS |
|...       |...      |

In fact, this way, we could even use the syntax meaningfully for INTERSECT and EXCEPT, e.g. to find customers who share their names with an actor:

SELECT *
FROM actor
INTERSECT CORRESPONDING BY (first_name, last_name)
SELECT *
FROM customer
ORDER BY first_name, last_name;

Producing:

|first_name|last_name|
|----------|---------|
|JENNIFER  |DAVIS    |

Other dialects

I haven’t encountered this syntax many times in other dialects before. Perhaps, it will ship to PostgreSQL in the future. A branch has been worked on by Vik Fearing:

jOOQ might soon support it in the API / parser / translator:

https://github.com/jOOQ/jOOQ/issues/5285

Using jOOQ’s DiagnosticsConnection to detect N+1 Queries

N+1 queries are a popular problem in many applications that run SQL queries. The problem can be described easily as follows:

  • 1 query fetching a parent value is run
  • N queries fetching each individual child values are run

This problem isn’t limited to SQL, it can happen with any poorly designed API that does not allow for batch and/or bulk processing (even stored procedures). But with SQL, it’s particularly painful, because in many cases, running tons of logic in a single query would be totally possible, especially with jOOQ’s MULTISET and SQL/XML or SQL/JSON support.

In the worst case, the N+1 problem is caused by a third party ORM – or rather, its poor implementation / configuration, but some ORMs make it really easy to shoot oneself in the foot with N+1 problems…

An example

Let’s stick with JDBC for now to illustrate how N+1 queries happen.

try (Statement stmt = conn.createStatement()) {

    // The parent query, fetching actors
    try (ResultSet r1 = stmt.executeQuery(
        """
        SELECT actor_id, first_name, last_name
        FROM actor
        LIMIT 5
        """
    )) {
        while (r1.next()) {
            System.out.println();
            System.out.println(
                "Actor: " + r1.getString(2) + " " + r1.getString(2));

            // The child query, fetching films per actor
            try (PreparedStatement pstmt = conn.prepareStatement(
                """
                SELECT count(*) FROM film_actor WHERE actor_id = ?
                """
            )) {
                pstmt.setInt(1, r1.getInt(1));

                try (ResultSet r2 = pstmt.executeQuery()) {
                    while (r2.next()) {
                        System.out.println("Films: " + r2.getInt(1));
                    }
                }
            }
        }
    }
}

When run against the sakila database, the above prints:

Actor: PENELOPE PENELOPE
Films: 19

Actor: NICK NICK
Films: 25

Actor: ED ED
Films: 22

Actor: JENNIFER JENNIFER
Films: 22

Actor: JOHNNY JOHNNY
Films: 29

Obviously correct, but we could have easily run this in a single query:

SELECT
  a.first_name,
  a.last_name,
  count(fa.film_id)
FROM actor AS a
LEFT JOIN film_actor AS fa USING (actor_id)
GROUP BY a.actor_id

Given that we have 200 actors in total, which do you prefer? Running 1+200 queries or just 1 query? If you’re in control of your SQL, this mistake is much less likely to happen, but what if you’re not in (complete) control, because the SQL is generated based on eager/lazy loading configurations and complex entity graph annotations, then you’ll be happy you can just plug in jOOQ’s DiagnosticsConnection’s repeated statements diagnostic into your integration test environment (not necessarily in production, as there’s some overhead to parse and normalise all the SQL strings).

Applied to the above JDBC example:

DSLContext ctx = DSL.using(connection);
ctx.configuration().set(new DefaultDiagnosticsListener() {
    @Override
    public void repeatedStatements(DiagnosticsContext c) {

        // Custom callback, may also throw exceptions, etc.
        System.out.println(
            "Repeated statement: " + c.normalisedStatement());
    }
});

Connection conn = ctx.diagnosticsConnection();

You’re now getting the following output:

Actor: PENELOPE PENELOPE
Films: 19

Actor: NICK NICK
Repeated statement: select count(*) from film_actor where actor_id = ?;
Films: 25

Actor: ED ED
Repeated statement: select count(*) from film_actor where actor_id = ?;
Films: 22

Actor: JENNIFER JENNIFER
Repeated statement: select count(*) from film_actor where actor_id = ?;
Films: 22

Actor: JOHNNY JOHNNY
Repeated statement: select count(*) from film_actor where actor_id = ?;
Films: 29

As you can see, the diagnostics connection starts logging after the first repetition of the statement, the assumption being that within a transaction, it is generally unnecessary to ever repeat a statement more than once, because there’s almost always a better way.

Using this with JPA / Hibernate

You probably don’t write JDBC statements manually like this, but it doesn’t matter who calls JDBC (you, jOOQ, JdbcTemplate, Hibernate, etc.). If you proxy your connection (or DataSource) with jOOQ’s DiagnosticsConnection or DiagnosticsDataSource, then you can intercept such events easily, no matter the cause.

Future versions of jOOQ will add a lot more diagnostics via https://github.com/jOOQ/jOOQ/issues/7527.

To see what’s available in jOOQ already, refer to the manual.

The Useful BigQuery * EXCEPT Syntax

One of the coolest things about using and making jOOQ is that we get to discover the best extensions to the standard SQL language by vendors, and add support for those clauses in jOOQ via emulations.

One of these syntaxes is BigQuery’s * EXCEPT syntax. Everyone who ever wrote ad-hoc SQL queries would have liked this syntax to be available, occasionally. Why is it needed? Look at this query:

SELECT * FROM actor

The result being:

|actor_id|first_name|last_name   |last_update            |
|--------|----------|------------|-----------------------|
|1       |PENELOPE  |GUINESS     |2006-02-15 04:34:33.000|
|2       |NICK      |WAHLBERG    |2006-02-15 04:34:33.000|
|3       |ED        |CHASE       |2006-02-15 04:34:33.000|
|4       |JENNIFER  |DAVIS       |2006-02-15 04:34:33.000|
|5       |JOHNNY    |LOLLOBRIGIDA|2006-02-15 04:34:33.000|
|6       |BETTE     |NICHOLSON   |2006-02-15 04:34:33.000|
|...

But that LAST_UPDATE column is mighty annoying, especially if we want to NATURAL JOIN things. E.g. this doesn’t work:

SELECT actor_id, a.first_name, a.last_name, count(fa.film_id)
FROM actor AS a
NATURAL LEFT JOIN film_actor AS fa
GROUP BY actor_id

The result is just actors without films, because accidentally, the LAST_UPDATE column was included in the NATURAL JOIN:

|actor_id|first_name|last_name|count|
|--------|----------|---------|-----|
|58      |CHRISTIAN |AKROYD   |0    |
|8       |MATTHEW   |JOHANSSON|0    |
|116     |DAN       |STREEP   |0    |
|184     |HUMPHREY  |GARLAND  |0    |
|87      |SPENCER   |PECK     |0    |

This is the biggest flaw of NATURAL JOIN, making it almost useless for schemas that aren’t perfectly designed for NATURAL JOIN usage, but this is ad-hoc SQL, and it would have been nice to do that.

We could, if we had * EXCEPT like this:

SELECT 
  a.actor_id, 
  a.first_name, 
  a.last_name, 
  count(fa.film_id)
FROM (
  SELECT * EXCEPT (last_update) FROM actor
) AS a
NATURAL LEFT JOIN (
  SELECT * EXCEPT (last_update) FROM film_actor
) AS fa
GROUP BY 
  a.actor_id, 
  a.first_name, 
  a.last_name

Unfortunately, this doesn’t work in PostgreSQL and other dialects, but jOOQ can emulate it. If you provide the online SQL translator with the sakila database table definitions:

CREATE TABLE actor (
    actor_id integer NOT NULL,
    first_name character varying(45) NOT NULL,
    last_name character varying(45) NOT NULL,
    last_update timestamp without time zone DEFAULT now() NOT NULL
);
CREATE TABLE film_actor (
    actor_id smallint NOT NULL,
    film_id smallint NOT NULL,
    last_update timestamp without time zone DEFAULT now() NOT NULL
);

Then, it can resolve the EXCEPT syntax of the query and produce this:

SELECT
  a.actor_id,
  a.first_name,
  a.last_name,
  count(fa.film_id)
FROM (
  SELECT actor.actor_id, actor.first_name, actor.last_name
  FROM actor
) a
  NATURAL LEFT OUTER JOIN (
    SELECT film_actor.actor_id, film_actor.film_id
    FROM film_actor
  ) fa
GROUP BY
  a.actor_id,
  a.first_name,
  a.last_name

Obviously, we could have just omitted NATURAL JOIN to achieve the same result, but sometimes, it’s just nice to have yet another tool in the tool chain to write a nice query. With jOOQ, the query would read:

Actor a = ACTOR.as("a");
FilmActor fa = FILM_ACTOR.as("fa");

ctx.select(
        a.ACTOR_ID,
        a.FIRST_NAME,
        a.LAST_NAME,
        count(fa.FILM_ID))
   .from(
        select(asterisk().except(a.LAST_UPDATE)).from(a).asTable(a))
   .naturalLeftOuterJoin(
        select(asterisk().except(fa.LAST_UPDATE)).from(fa).asTable(fa))
   .groupBy(a.ACTOR_ID, a.FIRST_NAME, a.LAST_NAME)
   .fetch();

3.16.0 Release with a new Public Query Object Model API, Spatial Support, YugabyteDB Support and Much More

This release tackles two long standing and complex feature requests that users
have asked us to offer for a long time: a public API for manipulating jOOQ’s
query object model (QOM), and spatial support.

New Query Object Model (QOM)

Every jOOQ query is modeled as an expression tree constructed via our intuitive
DSL. For some use-cases there exist historic model API versions (e.g.
SelectQuery), but those models aren’t readable or transformable. We’re now
starting to offer most of our expression tree model as a public API for users to
consume and manipulate. Every expression tree element has a corresponding type
in org.jooq.impl.QOM. Every type provides access to its components using "$"
prefixed method names, e.g.:

// Create an expression using the DSL API:
Field field = substring(BOOK.TITLE, 2, 4);

// Access the expression's internals using the model API
if (field instanceof QOM.Substring substring) {
    Field string = substring.$string();
    Field startingPosition = substring.$startingPosition();
    Field length = substring.$length();
}

The new API is experimental and might change in the next minor release.

Licensed power users will get auxiliary APIs to traverse and transform the
expression tree, e.g. traversal:

// Contains 7 query parts 
long count2 = BOOK.ID.eq(1).or(BOOK.ID.eq(2))
    .$traverse(Traversers.collecting(Collectors.counting());

Or replacement:

// Contains redundant operators
Condition c = not(not(BOOK.ID.eq(1)));
System.out.println(c.$replace(q ->
    q instanceof QOM.Not n1 && n1.$arg1() instanceof QOM.Not n2
        ? n2.$arg1()
        : q
));

The above prints the following, having removed the redundant NOT operators:

"BOOK"."ID" = 1

This new API is very powerful for even more sophisticated dynamic SQL use-cases,
such as:

  • Optimising SQL expressions as the above NOT NOT example
  • Row level security
  • Soft deletion
  • Shared schema multi tenancy
  • Audit column support
  • And much more (stay tuned for future blogs and out-of-the-box transformations)

For more info, see:

Spatial support

A long awaited feature that starts shipping to our commercially licensed
customers is spatial support. Many dialects support the ISO/IEC 13249-3:2016 SQL
standard extension, and finally, so do we.

jOOQ introduces new auxiliary data types for GEOMETRY and GEOGRAPHY data to use
as standardised bind variables, containing WKB or WKT data, as well as a
variety of out of the box functions and predicates.

Future versions will add more support for other functions and dialects.

New dialect and versions

Another new officially supported SQL dialect has been added to all jOOQ editions
including the jOOQ Open Source Edition: YugabyteDB. This was a sponsored
integration, thank you very much Yugabyte!

Other dialects have upgraded versions, meaning:

  • We support this version in all editions now
  • The jOOQ Open Source Edition now requires this version if applicable

The versions are:

  • Firebird 4.0
  • H2 2.0.202
  • MariaDB 10.6
  • PostgreSQL 14
  • Oracle 21c

Computed and readonly columns including ROWID

A lot of dialects support computed columns (“generated” columns), and we now
support those as well in jOOQ. For most use-cases, this does not affect jOOQ
usage, but especially when writing CRUD code, the new readonly column feature
can be very helpful to avoid having to exclude computed columns manually from
the CRUD operation.

This also includes a new, improved ROWID support, where users can choose to work
with a synthetic ROWID column instead of the primary key, when performing CRUD
operations.

All of these features are available in the commercial jOOQ editions.

Jakarta EE

We’ve moved on from Java EE to Jakarta EE dependencies. This change is
currently backwards incompatible because:

  • It greatly facilitates maintaining the related code
  • It prevents tons of user problems resulting from having both dependencies
  • We’re not actually integrating tightly with Java EE / Jakarta EE

The following Jakarta EE modules are affected:

  • JAXB, which we’re using to load our configuration objects.
  • Validation, whose annotations can be generated by the code generator.
  • JPA, which is used by the DefaultRecordMapper and the JPADatabase

If this breaking change causes issues, please get in touch at
https://github.com/jOOQ/jOOQ/issues/9641

Various improvements

As with every minor release, a lot of smaller improvements have been
implemented. Some highlights include:

  • PostgreSQL procedures are now supported in code generation and at runtime.
  • SQLite JSON support has been added, which includes the MULTISET emulation!
  • A lot of MULTISET / ROW improvements have been implemented
  • R2DBC 0.9 has been released, and we’ve upgraded our dependency
  • The Java 17 distribution now requires Java 17 instead of Java 16
  • Pre jOOQ 3.6 deprecations have been removed

Full release notes here