Ad-hoc Data Type Conversion with jOOQ 3.15

jOOQ 3.15 shipped with a ton of new features, the most important ones being:

A very useful, lesser known new feature is “ad-hoc data type conversion”. Data type converters and bindings have been around in jOOQ for a long time. Their goal is to allow for using custom data types for common JDBC types like String or Integer. So, if you have a table like this:

CREATE TABLE furniture (
  id INT NOT NULL PRIMARY KEY,
  name TEXT NOT NULL,
  length NUMERIC,
  width NUMERIC,
  height NUMERIC
);

Instead of using BigDecimal for those dimensions, you may have preferred a custom, more semantic wrapper type for numbers, such as:

record Dimension(BigDecimal value) {}

And your Java representation of Furniture would be:

record Furniture(
  Integer id,
  String name,
  Dimension length,
  Dimension width,
  Dimension height
) {}

You’d go and attach a converter to your code generator, e.g.

<configuration>
  <generator>                            
    <database>
      <forcedTypes>
        <forcedType>
          <userType>com.example.Dimension</userType>
          <converter><![CDATA[
          org.jooq.Converter.ofNullable(
            BigDecimal.class,
            Dimension.class,
            Dimension::new,
            Dimension::value
          )
          ]]></converter>
          <includeExpression>LENGTH|WIDTH|HEIGHT</includeExpression>
        </forcedType>
      </forcedTypes>
    </database>
  </generator>
</configuration>

That would allow you to query your database like this:

Result<Record3<Dimension, Dimension, Dimension>> result =
ctx.select(FURNITURE.LENGTH, FURNITURE.WIDTH, FURNITURE.HEIGHT)
   .from(FURNITURE)
   .fetch();

But sometimes, you can’t leverage code generation:

  • You can’t access the code generator configuration for some reason
  • You don’t want to attach a converter to your columns for every query
  • You’re not using the code generator because you have a dynamic schema known only at runtime

Enter Ad-hoc Converters

Starting from jOOQ 3.15, we support various ways of registering a convenient ad-hoc converter to your Field<T> expression. This feature was mainly introduced to allow for mapping MULTISET nested collections to lists of a custom data type (a feature we urge you to try out, you won’t look back!)

But you can use the feature also for any other Field expression. Assuming you can’t use code generation for the above query (the main reason, again, being your schema being dynamic). You would probably write something like this:

Table<?> furniture = table(name("furniture"));
Field<BigDecimal> length = field(name("furniture", "length"), NUMERIC);
Field<BigDecimal> width  = field(name("furniture", "width"), NUMERIC);
Field<BigDecimal> height = field(name("furniture", "height"), NUMERIC);

Result<Record3<BigDecimal, BigDecimal, BigDecimal>> result =
ctx.select(length, width, height)
   .from(furniture)
   .fetch();

As always, the usual static imports are implied:

import static org.jooq.impl.DSL.*;
import static org.jooq.impl.SQLDataType.*;

But code generation is ultimately just convenience. You can always achieve everything you can with jOOQ’s code generator also without it (though I do recommend you use code generation if possible!). So, in order to re-use our Dimension data type, historically, you could do this:

DataType<Dimension> type = NUMERIC.asConvertedDataType(
    Converter.ofNullable(
        BigDecimal.class,
        Dimension.class,
        Dimension::new,
        Dimension::value
    )    
);

Table<?> furniture = table(name("furniture"));
Field<Dimension> length = field(name("furniture", "length"), type);
Field<Dimension> width  = field(name("furniture", "width"), type);
Field<Dimension> height = field(name("furniture", "height"), type);

Result<Record3<Dimension, Dimension, Dimension>> result =
ctx.select(length, width, height)
   .from(furniture)
   .fetch();

That’s already very neat. But again, you’re going to create a Field reference that always uses this converter. Maybe, you wanted conversion to apply just for this one query? No problem with ad-hoc converters! Write this:

Table<?> furniture = table(name("furniture"));
Field<BigDecimal> length = field(name("furniture", "length"), NUMERIC);
Field<BigDecimal> width  = field(name("furniture", "width"), NUMERIC);
Field<BigDecimal> height = field(name("furniture", "height"), NUMERIC);

Result<Record3<BigDecimal, BigDecimal, Dimension>> result =
ctx.select(length, width, height.convertFrom(Dimension::new))
   // ad-hoc conversion here:    ^^^^^^^^^^^^^^^^^^^^^^^^^^^
   .from(furniture)
   .fetch();

There are various overloads of Field.convert(), the most powerful one being the ones that accept a complete Binding or Converter reference. The above one is very convenient, as it allows you to provide only the “from” Function<T, U> of a converter, omitting the Class<T>, Class<U>, and “to” Function<U, T>.

What is a Converter?

What is a Converter after all? It is an implementation for this:

public interface Converter<T, U> {
    U from(T databaseObject);
    T to(U userObject);
    Class<T> fromType();
    Class<U> toType();
}

Where:

  • T is the “JDBC type”, i.e. a technical type understood by the JDBC API, such as String or BigDecimal
  • U is the “user type”, i.e. a semantic type that you choose to represent data in your client application
  • Class<T> is a class literal for T, required for reflection purposes, e.g. to create an array T[] at runtime
  • Class<U> is a class literal for U, required for reflection purposes, e.g. to create an array U[] at runtime

When attaching a Converter to the code generator, it is always good to provide all of the above. The two conversion functions converting between T and U, as well as the class literals. You never know if jOOQ needs them for some specific operation.

But in the case of ad-hoc conversion, you usually only need one of the from (read) or to (write) functions. Why repeat all of the rest? Hence, these options:

// A "read-only" field converting from BigDecimal to Dimension
height.convertFrom(Dimension::new);

// Like above, but with an explicit class literal, if needed
height.convertFrom(Dimension.class, Dimension::new);

// A "write-only" field converting from Dimension to BigDecimal
height.convertTo(Dimension::value);

// Like above, but with an explicit class literal, if needed
height.convertTo(Dimension.class, Dimension::value);

// Full read/write converter support
height.convert(Dimension.class, Dimension::new, Dimension::value);
height.convert(Converter.ofNullable(
    BigDecimal.class,
    Dimension.class, 
    Dimension::new, 
    Dimension::value
));

What’s the difference between “read-only” and “write-only” conversions? Simple. Look at these queries:

Result<Record1<Dimension>> result =
ctx.select(height.convertFrom(Dimension::new))
   .from(furniture)
   .fetch();

ctx.insertInto(furniture)
   .columns(height.convertTo(Dimension::value))
   .values(new Dimension(BigDecimal.ONE))
   .execute();

So, in summary:

  • The read-only ad-hoc converter is useful in projections (SELECT)
  • The write-only ad-hoc converter is useful in predicates (WHERE), or DML

Reactive SQL with jOOQ 3.15 and R2DBC

One of the biggest new features of the recently released jOOQ 3.15 is its new support for reactive querying via R2DBC. This has been a highly popular feature request, and we finally delivered on it.

You can continue using jOOQ the way you were used to, providing you with type safe, embedded SQL in Java, kotlin, or scala, but your query executions are no longer blocking. Instead, your jOOQ ResultQuery or Query can be used as a Publisher<R> or Publisher<Integer> in the reactive-streams implementation of your choice.

Instead of (or in addition to) configuring your jOOQ DSLContext with a JDBC java.sql.Connection or javax.sql.DataSource, just configure it with an R2DBC io.r2dbc.spi.Connection or io.r2dbc.spi.ConnectionFactory:

ConnectionFactory connectionFactory = ConnectionFactories.get(
    ConnectionFactoryOptions
        .parse("r2dbc:h2:file://localhost/~/r2dbc-test")
        .mutate()
        .option(ConnectionFactoryOptions.USER, "sa")
        .option(ConnectionFactoryOptions.PASSWORD, "")
        .build()
);

DSLContext ctx = DSL.using(connectionFactory);

Alternatively, use Spring Boot to auto-configure jOOQ like this:

Starting from this DSLContext, you can build your queries as always, but instead of calling the usual blocking execute() or fetch() methods, you’ll just wrap the query in a Flux, for example. Assuming you ran the jOOQ code generator on your H2 INFORMATION_SCHEMA, you can now write:

record Table(String schema, String table) {}

Flux.from(ctx
        .select(
            INFORMATION_SCHEMA.TABLES.TABLE_SCHEMA,
            INFORMATION_SCHEMA.TABLES.TABLE_NAME)
        .from(INFORMATION_SCHEMA.TABLES))

    // Type safe mapping from Record2<String, String> to Table::new
    .map(Records.mapping(Table::new))
    .doOnNext(System.out::println)
    .subscribe();

jOOQ will acquire an R2DBC Connection from your ConnectionFactory, and release it again after the query has been executed, allowing for optimised resource management, something that is otherwise a bit tricky with R2DBC and reactor. In other words, the above execution corresponds to this manually written query:

Flux.usingWhen(
        connectionFactory.create(),
        c -> c.createStatement(
                """
                SELECT table_schema, table_name
                FROM information_schema.tables
                """
             ).execute(),
        c -> c.close()
    )
    .flatMap(it -> it.map((r, m) -> 
         new Table(r.get(0, String.class), r.get(1, String.class))
    ))
    .doOnNext(System.out::println)
    .subscribe();

Both will print something like the following:

Table[schema=INFORMATION_SCHEMA, table=TABLE_PRIVILEGES]
Table[schema=INFORMATION_SCHEMA, table=REFERENTIAL_CONSTRAINTS]
Table[schema=INFORMATION_SCHEMA, table=TABLE_TYPES]
Table[schema=INFORMATION_SCHEMA, table=QUERY_STATISTICS]
Table[schema=INFORMATION_SCHEMA, table=TABLES]
Table[schema=INFORMATION_SCHEMA, table=SESSION_STATE]
Table[schema=INFORMATION_SCHEMA, table=HELP]
Table[schema=INFORMATION_SCHEMA, table=COLUMN_PRIVILEGES]
Table[schema=INFORMATION_SCHEMA, table=SYNONYMS]
Table[schema=INFORMATION_SCHEMA, table=SESSIONS]
Table[schema=INFORMATION_SCHEMA, table=IN_DOUBT]
Table[schema=INFORMATION_SCHEMA, table=USERS]
Table[schema=INFORMATION_SCHEMA, table=COLLATIONS]
Table[schema=INFORMATION_SCHEMA, table=SCHEMATA]
Table[schema=INFORMATION_SCHEMA, table=TABLE_CONSTRAINTS]
Table[schema=INFORMATION_SCHEMA, table=INDEXES]
Table[schema=INFORMATION_SCHEMA, table=ROLES]
Table[schema=INFORMATION_SCHEMA, table=FUNCTION_COLUMNS]
Table[schema=INFORMATION_SCHEMA, table=CONSTANTS]
Table[schema=INFORMATION_SCHEMA, table=SEQUENCES]
Table[schema=INFORMATION_SCHEMA, table=RIGHTS]
Table[schema=INFORMATION_SCHEMA, table=FUNCTION_ALIASES]
Table[schema=INFORMATION_SCHEMA, table=CATALOGS]
Table[schema=INFORMATION_SCHEMA, table=CROSS_REFERENCES]
Table[schema=INFORMATION_SCHEMA, table=SETTINGS]
Table[schema=INFORMATION_SCHEMA, table=DOMAINS]
Table[schema=INFORMATION_SCHEMA, table=KEY_COLUMN_USAGE]
Table[schema=INFORMATION_SCHEMA, table=LOCKS]
Table[schema=INFORMATION_SCHEMA, table=COLUMNS]
Table[schema=INFORMATION_SCHEMA, table=TRIGGERS]
Table[schema=INFORMATION_SCHEMA, table=VIEWS]
Table[schema=INFORMATION_SCHEMA, table=TYPE_INFO]
Table[schema=INFORMATION_SCHEMA, table=CONSTRAINTS]

Note that if you’re using JDBC and not R2DBC, you can continue to use the jOOQ API with your reactive streams libraries in a blocking manner in exactly the same way as above, e.g. if your favourite RDBMS does not yet support a reactive R2DBC driver. Currently supported drivers according to r2dbc.io include:

All of which we integration test with jOOQ 3.15+.

A runnable example

Go play with the example here: https://github.com/jOOQ/jOOQ/tree/main/jOOQ-examples/jOOQ-r2dbc-example

It uses the following schema:

CREATE TABLE r2dbc_example.author (
  id INT NOT NULL AUTO_INCREMENT,
  first_name VARCHAR(100) NOT NULL,
  last_name VARCHAR(100) NOT NULL,
  
  CONSTRAINT pk_author PRIMARY KEY (id)
);

CREATE TABLE r2dbc_example.book (
  id INT NOT NULL AUTO_INCREMENT,
  author_id INT NOT NULL,
  title VARCHAR(100) NOT NULL,
  
  CONSTRAINT pk_book PRIMARY KEY (id),
  CONSTRAINT fk_book_author FOREIGN KEY (id) 
    REFERENCES r2dbc_example.author
);

And runs this code

Flux.from(ctx
        .insertInto(AUTHOR)
        .columns(AUTHOR.FIRST_NAME, AUTHOR.LAST_NAME)
        .values("John", "Doe")
        .returningResult(AUTHOR.ID))
    .flatMap(id -> ctx
        .insertInto(BOOK)
        .columns(BOOK.AUTHOR_ID, BOOK.TITLE)
        .values(id.value1(), "Fancy Book"))
    .thenMany(ctx
        .select(
             BOOK.author().FIRST_NAME, 
             BOOK.author().LAST_NAME, 
             BOOK.TITLE)
        .from(BOOK))
    .doOnNext(System.out::println)
    .subscribe();

To insert two records and fetch the joined result as follows:

+----------+---------+----------+
|FIRST_NAME|LAST_NAME|TITLE     |
+----------+---------+----------+
|John      |Doe      |Fancy Book|
+----------+---------+----------+

3.15.0 Release with Support for R2DBC, Nested ROW, ARRAY, and MULTISET types, 5 new SQL dialects, CREATE PROCEDURE, FUNCTION, and TRIGGER support and Much More

R2DBC

What a lot of users have been waiting for: jOOQ 3.15 is reactive, thanks to the new native R2DBC integration. Recent versions already implemented the reactive streams Publisher SPI, but now we’re not cheating anymore. We’re not longer blocking. Just wrap your R2DBC ConnectionFactory configured jOOQ query in a Flux (or any reactive streams API of your choice), and see what happens.

Flux.from(ctx.select(BOOK.TITLE).from(BOOK));

Both blocking (via JDBC) and non-blocking (via R2DBC) can work side-by-side, allowing users to quickly a query between the two execution models without anychanges to the query building logic.

Projecting ROW types, ARRAY of ROW Types, and MULTISETS

After having implemented standard SQL/XML and SQL/JSON support in jOOQ 3.14, another major milestone in taking SQL to the next level is now available as anexperimental feature: Nesting collections using the standard SQL MULTISET operator.

The operator is currently emulated using SQL/XML or SQL/JSON. The resulting documents are parsed again when fetching them from JDBC. Future versions will also provide native support (Informix, Oracle), and emulations using ARRAY (various dialects, including PostgreSQL).

Imagine this query against the Sakila database (https://www.jooq.org/sakila):

var result =
ctx.select(
      FILM.TITLE,
      multiset(
        select(ACTOR.FIRST_NAME, ACTOR.LAST_NAME)
        .from(ACTOR)
        .join(FILM_ACTOR).using(ACTOR.ACTOR_ID)
        .where(FILM_ACTOR.FILM_ID.eq(FILM.FILM_ID))
      ).as("actors"),
      multiset(
        select(CATEGORY.NAME)
        .from(CATEGORY)
        .join(FILM_CATEGORY).using(CATEGORY.CATEGORY_ID)
        .where(FILM_CATEGORY.FILM_ID.eq(FILM.FILM_ID))
      ).as("films")
   )
   .from(FILM)
   .orderBy(FILM.TITLE)
   .fetch();

You’re really going to love Java 10’s var keyword for these purposes. What’s the type of result? Exactly:

Result<Record3<
    String, 
    Result<Record2<String, String>>, 
    Result<Record1<String>>
>>

It contains:

+---------------------------+--------------------------------------------------+---------------+
|title                      |actors                                            |films          |
+---------------------------+--------------------------------------------------+---------------+
|ACADEMY DINOSAUR           |[(PENELOPE, GUINESS), (CHRISTIAN, GABLE), (LUCI...|[(Documentary)]|
|ACE GOLDFINGER             |[(BOB, FAWCETT), (MINNIE, ZELLWEGER), (SEAN, GU...|[(Horror)]     |
|ADAPTATION HOLES           |[(NICK, WAHLBERG), (BOB, FAWCETT), (CAMERON, ST...|[(Documentary)]|
 ...

Two collections were nested in a single query without producing any unwanted cartesian products and duplication of data. And stay tuned, we’ve added more goodies! See this article on how to map the above structural type to your nominal types (e.g. Java 16 records) in a type safe way, without reflection!

More info here:

New Dialects

We’ve added support for 5 (!) new SQLDialect’s. That’s unprecedented for any previous minor release. The new dialects are:

  • BIGQUERY
  • EXASOL
  • IGNITE
  • JAVA
  • SNOWFLAKE

Yes, there’s an experimental “JAVA” dialect. It’s mostly useful if you want to translate your native SQL queries to jOOQ using https://www.jooq.org/translate, and it cannot be executed. In the near future, we might add SCALA and KOTLIN as well, depending on demand.

BigQuery and Snowflake were long overdue by popular vote. The expedited EXASOL support has been sponsored by a customer, which is a great reminder that this is always an option. You need something more quickly? We can make it happen, even if the feature isn’t very popular on the roadmap.

Many other dialects have been brought up to date, including REDSHIFT, HANA, VERTICA, and two have been deprecated: INGRES and ORACLE10G, as these grow less and less popular.

Drop Java 6/7 support for Enterprise Edition, require Java 11 in OSS Edition

We’re cleaning up our support for old dependencies and features. Starting with jOOQ 3.12, we offered Java 6 and 7 support only to jOOQ Enterprise Edition customers. With jOOQ 3.15, this support has now been removed, and Java 8 is the new baseline for commercial editions, Java 11 for the jOOQ Open Source Edition, meaning the OSS Edition is now finally modular, and we get access to little things like the Flow API (see R2DBC) and @Deprecate(forRemoval, since).

Upgrading to Java 8 allows for exciting new improvements to our internals, as we can finally use default methods, lambdas, generalised target type inference, effectively final, diamond operators, try-with-resources, string switches, and what not. Improving our code base leads to dog fooding, and that again leads to new features for you. For example, we’ve put a lot of emphasis on ResultQuery.collect(), refactoring internals: https://blog.jooq.org/2021/05/17/use-resultquery-collect-to-implement-powerful-mappings/

There are new auxiliary types, like org.jooq.Rows and org.jooq.Records for more functional transformation convenience. More functions mean less loops, and also less ArrayList allocations.

At the same time, we’ve started building a Java 17 ready distribution for the commercial editions, which unlocks better record type support.

Refactoring of ResultQuery to work with DML

With all the above goodies related to Java 8, and a more functional usage of jOOQ, we’ve also finally refactored our DML statement type hierarchy (INSERT,UPDATE, DELETE), to let their respective RETURNING clauses return an actual ResultQuery. That means you can now stream(), collect(), fetchMap() and subscribe() (via R2DBC) to your DML statements, and even put them in the WITH clause (in PostgreSQL).

Massive improvements to the parser / translator use case

jOOQ’s secondary value proposition is to use its parser and translator, instead of the DSL API, which is also available for free on our website: https://www.jooq.org/translate

With increasing demand for this product, we’ve greatly improved the experience:

  • The ParsingConnection no longer experimental
  • Batch is now possible
  • We’ve added a cache for input/output SQL string pairs to heavily speed up the integration
  • We’re now delaying bind variable type inference to use actual PreparedStatement information. This produces more accurate results, especially when data types are not known to the parser.
  • A new ParseListener SPI allows for hooking into the parser and extend it with custom syntax support for column, table, and predicate expressions.

CREATE PROCEDURE, FUNCTION, TRIGGER and more procedural instructions

Over the recent releases, we’ve started working on procedural language extensions for the commercial distributions. In addition to creating anonymous blocks, we now also support all lifecycle management DDL for procedures, functions, and triggers, which can contain procedural language logic.

This is great news if you’re supporting multiple RDBMS and want to move some more data processing logic to the server side in a vendor agnostic way.

Explicit JDBC driver dependencies to avoid reflection

To get AOP ready, we’re slowly removing internal reflection usage, meaning we’re experimenting with an explicit JDBC driver build-time dependency. This currently affects:

  • Oracle
  • PostgreSQL
  • SQL Server

Only drivers available from Maven Central have been added as dependency so far.

Full release notes here.

jOOQ 3.15’s New Multiset Operator Will Change How You Think About SQL

This is how SQL should have been used all along.

They called it The Third Manifesto, ORDBMS, or other things. Regrettably, it never really took off. Because most vendors didn’t adopt it. And those who did, didn’t agree on syntax.

But this is about to change. Thanks to the now ubiquitous SQL/JSON support (which jOOQ 3.14 has already covered), we can now emulate the most powerful ORDBMS feature that you will want to use everywhere: Nested collections!

How We Used to do Things: With Joins

We’re going to be using the Sakila database for this example. It’s a DVD rental store with things like ACTOR, FILM, CATEGORY (of films) and other nice relational things. Let’s write a query for this requirement

Get me all the films with their actors and their categories

Classically, we’d go ahead and use jOOQ to write:

ctx.select(
      FILM.TITLE,
      ACTOR.FIRST_NAME,
      ACTOR.LAST_NAME,
      CATEGORY.NAME
   )
   .from(ACTOR)
   .join(FILM_ACTOR)
     .on(ACTOR.ACTOR_ID.eq(FILM_ACTOR.ACTOR_ID))
   .join(FILM)
     .on(FILM_ACTOR.FILM_ID.eq(FILM.FILM_ID))
   .join(FILM_CATEGORY)
     .on(FILM.FILM_ID.eq(FILM_CATEGORY.FILM_ID))
   .join(CATEGORY)
     .on(FILM_CATEGORY.CATEGORY_ID.eq(CATEGORY.CATEGORY_ID))
   .orderBy(1, 2, 3, 4)
   .fetch();

And the result? Not so nice. A denormalised, flat table containing tons of repetition:

+----------------+----------+---------+-----------+
|title           |first_name|last_name|name       |
+----------------+----------+---------+-----------+
|ACADEMY DINOSAUR|CHRISTIAN |GABLE    |Documentary|
|ACADEMY DINOSAUR|JOHNNY    |CAGE     |Documentary|
|ACADEMY DINOSAUR|LUCILLE   |TRACY    |Documentary|
|ACADEMY DINOSAUR|MARY      |KEITEL   |Documentary|
|ACADEMY DINOSAUR|MENA      |TEMPLE   |Documentary|
|ACADEMY DINOSAUR|OPRAH     |KILMER   |Documentary|
|ACADEMY DINOSAUR|PENELOPE  |GUINESS  |Documentary|
|ACADEMY DINOSAUR|ROCK      |DUKAKIS  |Documentary|
|ACADEMY DINOSAUR|SANDRA    |PECK     |Documentary|
|ACADEMY DINOSAUR|WARREN    |NOLTE    |Documentary|
|ACE GOLDFINGER  |BOB       |FAWCETT  |Horror     |
|ACE GOLDFINGER  |CHRIS     |DEPP     |Horror     |
|ACE GOLDFINGER  |MINNIE    |ZELLWEGER|Horror     |
 ...

If we don’t consume this result unmodified as it is (e.g. when displaying tabular data to a user), we’d then go and de-duplicate things, shoehorning them back again into some nested data structures (e.g. for consumption by some JSON based UI), and spending hours trying to untangle the cartesian products between the 2 nested collections FILM -> ACTOR and FILM -> CATEGORY (because ACTOR and CATEGORY now created a cartesian product, which we didn’t want!)

In the worst case, we don’t even notice! This example database only has 1 category per film, but it is designed to support multiple categories.

jOOQ can help with that deduplication, but just look at the number of questions for jOOQ many to many on Stack Overflow! You’ll probably still have to write at least 2 queries to separate the nested collections.

ENTER the Stage: Multiset

The standard SQL <multiset value constructor> operator allows for collecting the data from a correlated subquery into a nested data structure, a MULTISET. Everything in SQL is a MULTISET, so the operator isn’t too surprising. But the nesting is where it shines. The previous query can now be re-written in jOOQ as follows:

var result =
dsl.select(
      FILM.TITLE,
      multiset(
        select(
          FILM_ACTOR.actor().FIRST_NAME, 
          FILM_ACTOR.actor().LAST_NAME)
        .from(FILM_ACTOR)
        .where(FILM_ACTOR.FILM_ID.eq(FILM.FILM_ID))
      ).as("actors"),
      multiset(
        select(FILM_CATEGORY.category().NAME)
        .from(FILM_CATEGORY)
        .where(FILM_CATEGORY.FILM_ID.eq(FILM.FILM_ID))
      ).as("films")
   )
   .from(FILM)
   .orderBy(FILM.TITLE)
   .fetch();

Note, it’s not relevant to this task, but I’m using jOOQ’s type safe implicit to-one join feature, which further helps taming the joins, syntactically. A matter of taste.

How to read this query? Easy:

  • Get all the films
  • For each FILM, get all the actors as a nested collection
  • For each FILM, get all the categories as a nested collection

You’re going to like Java 10’s var keyword even more after this :) Because what type is result? It’s of this type:

Result<Record3<
  String,                          // FILM.TITLE
  Result<Record2<String, String>>, // ACTOR.FIRST_NAME, ACTOR.LAST_NAME
  Result<Record1<String>>          // CATEGORY.NAME
>>

That’s quite something. Not too complex if you think of it. There’s a result with 3 columns

  • TITLE
  • A first nested result that has 2 string columns: ACTOR.FIRST_NAME and ACTOR.LAST_NAME
  • A second nested result that has 1 string column: CATEGORY.NAME

Using var or other type inference mechanisms, you don’t have to denote this type. Or even better (stay tuned): We’ll type-safely map the structural type to our nominal DTO type hierarchy, with just a few additional lines of code. I’ll explain that later.

What Does the Result Look Like?

Calling toString() on the above Result type yields something like this:

+---------------------------+--------------------------------------------------+---------------+
|title                      |actors                                            |films          |
+---------------------------+--------------------------------------------------+---------------+
|ACADEMY DINOSAUR           |[(PENELOPE, GUINESS), (CHRISTIAN, GABLE), (LUCI...|[(Documentary)]|
|ACE GOLDFINGER             |[(BOB, FAWCETT), (MINNIE, ZELLWEGER), (SEAN, GU...|[(Horror)]     |
|ADAPTATION HOLES           |[(NICK, WAHLBERG), (BOB, FAWCETT), (CAMERON, ST...|[(Documentary)]|
|AFFAIR PREJUDICE           |[(JODIE, DEGENERES), (SCARLETT, DAMON), (KENNET...|[(Horror)]     |
|AFRICAN EGG                |[(GARY, PHOENIX), (DUSTIN, TAUTOU), (MATTHEW, L...|[(Family)]     |
|AGENT TRUMAN               |[(KIRSTEN, PALTROW), (SANDRA, KILMER), (JAYNE, ...|[(Foreign)]    |
 ...

Notice how we’re back to getting each FILM.TITLE entry only once (no duplication), and nested in each row are the subquery results. There’s no denormalisation happening!

When calling result.formatJSON() with the appropriate formatting options, we’ll get this representation:

[
  {
    "title": "ACADEMY DINOSAUR",
    "actors": [
      {
        "first_name": "PENELOPE",
        "last_name": "GUINESS"
      },
      {
        "first_name": "CHRISTIAN",
        "last_name": "GABLE"
      },
      {
        "first_name": "LUCILLE",
        "last_name": "TRACY"
      },
      {
        "first_name": "SANDRA",
        "last_name": "PECK"
      },

      ...
    ],
    "films": [
      { "name": "Documentary" }
    ]
  },
  {
    "title": "ACE GOLDFINGER",
    "actors": [
      {
        "first_name": "BOB",
        "last_name": "FAWCETT"
      },
  ...

Calling result.formatXML() would produce this:

<result>
  <record>
    <title>ACADEMY DINOSAUR</title>
    <actors>
      <result>
        <record>
          <first_name>PENELOPE</first_name>
          <last_name>GUINESS</last_name>
        </record>
        <record>
          <first_name>CHRISTIAN</first_name>
          <last_name>GABLE</last_name>
        </record>
        <record>
          <first_name>LUCILLE</first_name>
          <last_name>TRACY</last_name>
        </record>
        <record>
          <first_name>SANDRA</first_name>
          <last_name>PECK</last_name>
        </record>
        ...
      </result>
    </actors>
    <films>
      <result>
        <record>
          <name>Documentary</name>
        </record>
      </result>
    </films>
  </record>
  <record>
    <title>ACE GOLDFINGER</title>
    <actors>
      <result>
        <record>
          <first_name>BOB</first_name>
          <last_name>FAWCETT</last_name>
        </record>
        ...

You get the idea!

What’s the Generated SQL?

Just turn on jOOQ’s DEBUG logging and observe a query like this one (in PostgreSQL):

select
  film.title,
  (
    select coalesce(
      jsonb_agg(jsonb_build_object(
        'first_name', t.first_name,
        'last_name', t.last_name
      )),
      jsonb_build_array()
    )
    from (
      select 
        alias_78509018.first_name, 
        alias_78509018.last_name
      from (
        film_actor
          join actor as alias_78509018
            on film_actor.actor_id = alias_78509018.actor_id
        )
      where film_actor.film_id = film.film_id
    ) as t
  ) as actors,
  (
    select coalesce(
      jsonb_agg(jsonb_build_object('name', t.name)),
      jsonb_build_array()
    )
    from (
      select alias_130639425.name
      from (
        film_category
          join category as alias_130639425
            on film_category.category_id = alias_130639425.category_id
        )
      where film_category.film_id = film.film_id
    ) as t
  ) as films
from film
order by film.title

The Db2, MySQL, Oracle, SQL Server versions would look similar. Just try it on your Sakila database installation. It runs fast, too.

Mapping the Results to DTOs

Now, I promised to get rid of that lengthy structural type with all the generics. Check this out!

We used to call them POJOs (Plain Old Java Objects). Then DTOs (Data Transfer Objects). Now records. Yes, let’s try some Java 16 records here. (Note, records aren’t required for these examples. Any POJOs with appropriate constructors would do).

Wouldn’t it be nice, if result was of this type

record Actor(String firstName, String lastName) {}
record Film(
  String title,
  List<Actor> actors,
  List<String> categories
) {}

List<Film> result = ...

Structural typing is essential to jOOQ and its type safe query system, but in your code, you probably don’t want to have those merge-conflict lookalike generics all the time, and even var won’t help you if your data needs to be returned from a method.

So, let’s transform our jOOQ query to one that produces List<Film>, step by step. We’re starting with the original query, untouched:

Result<Record3<
    String, 
    Result<Record2<String, String>>, 
    Result<Record1<String>>
>> result =
dsl.select(
      FILM.TITLE,
      multiset(
        select(
          FILM_ACTOR.actor().FIRST_NAME, 
          FILM_ACTOR.actor().LAST_NAME)
        .from(FILM_ACTOR)
        .where(FILM_ACTOR.FILM_ID.eq(FILM.FILM_ID))
      ).as("actors"),
      multiset(
        select(FILM_CATEGORY.category().NAME)
        .from(FILM_CATEGORY)
        .where(FILM_CATEGORY.FILM_ID.eq(FILM.FILM_ID))
      ).as("films")
   )
   .from(FILM)
   .orderBy(FILM.TITLE)
   .fetch();

Now, we’re going to map the Actor in the first MULTISET expression. This can be done as follows, using the new Field.convertFrom() convenience method, which allows to turn a Field<T> into any read-only Field<U> for ad-hoc usage. A simple example would be this:

record Title(String title) {}

// Use this field in any query
Field<Title> title = FILM.TITLE.convertFrom(Title::new);

It’s just an easy, new way to attach a read-only Converter to a Field for single usage, instead of doing that with the code generator.

Applied to the original query:

Result<Record3<
    String, 
    List<Actor>,  // A bit nicer already
    Result<Record1<String>>
>> result =
dsl.select(
      FILM.TITLE,
      multiset(
        select(
          FILM_ACTOR.actor().FIRST_NAME, 
          FILM_ACTOR.actor().LAST_NAME)
        .from(FILM_ACTOR)
        .where(FILM_ACTOR.FILM_ID.eq(FILM.FILM_ID))

      // Magic here: vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
      ).as("actors").convertFrom(r -> r.map(mapping(Actor::new))),

      multiset(
        select(FILM_CATEGORY.category().NAME)
        .from(FILM_CATEGORY)
        .where(FILM_CATEGORY.FILM_ID.eq(FILM.FILM_ID))
      ).as("films")
   )
   .from(FILM)
   .orderBy(FILM.TITLE)
   .fetch();

What are we doing here?

  • The method convertFrom() takes a lambda Result<Record2<String, String>> -> Actor.
  • The Result type is the usual jOOQ Result, which has a Result.map(RecordMapper<R, E>) method.
  • The mapping() method is the new Records.mapping(), which isn’t doing much, just turning a constructor reference of type Function2<String, String, Actor> into a RecordMapper, which can then be used to turn a Result<Record2<String, String>> into a List<Actor>.

And it type checks! Try it yourself. If you add a column to the multiset, you’ll get a compilation error. If you add/remove an attribute from the Actor record, you’ll get a compilation error. No reflection here, just declarative mapping of jOOQ results/records to custom List<UserType>. If you prefer the “old” reflection approach using jOOQ’s ubiquitous into() methods, you can still do that, too:

Result<Record3<
    String, 
    List<Actor>,  // A bit nicer already
    Result<Record1<String>>
>> result =
dsl.select(
      FILM.TITLE,
      multiset(
        select(
          FILM_ACTOR.actor().FIRST_NAME, 
          FILM_ACTOR.actor().LAST_NAME)
        .from(FILM_ACTOR)
        .where(FILM_ACTOR.FILM_ID.eq(FILM.FILM_ID))

      // Magic here: vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
      ).as("actors").convertFrom(r -> r.into(Actor.class))),

      multiset(
        select(FILM_CATEGORY.category().NAME)
        .from(FILM_CATEGORY)
        .where(FILM_CATEGORY.FILM_ID.eq(FILM.FILM_ID))
      ).as("films")
   )
   .from(FILM)
   .orderBy(FILM.TITLE)
   .fetch();

The result still type checks, but the conversion from Result<Record2<String, String>> to List<Actor> no longer does, it uses reflection.

Let’s continue. Let’s remove the clumsy category Result<Record1<String>>. We could’ve added another record, but in this case, a List<String> will suffice.

Result<Record3<
    String, 
    List<Actor>,
    List<String>  // Begone, jOOQ structural type!
>> result =
dsl.select(
      FILM.TITLE,
      multiset(
        select(
          FILM_ACTOR.actor().FIRST_NAME, 
          FILM_ACTOR.actor().LAST_NAME)
        .from(FILM_ACTOR)
        .where(FILM_ACTOR.FILM_ID.eq(FILM.FILM_ID))
      ).as("actors").convertFrom(r -> r.map(mapping(Actor::new))),
      multiset(
        select(FILM_CATEGORY.category().NAME)
        .from(FILM_CATEGORY)
        .where(FILM_CATEGORY.FILM_ID.eq(FILM.FILM_ID))

      // Magic.     vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
      ).as("films").convertFrom(r -> r.map(Record1::value1))
   )
   .from(FILM)
   .orderBy(FILM.TITLE)
   .fetch();

And finally, the outer-most Result<Record3<...>> to List<Film> conversion

List<Film> result =
dsl.select(
      FILM.TITLE,
      multiset(
        select(
          FILM_ACTOR.actor().FIRST_NAME, 
          FILM_ACTOR.actor().LAST_NAME)
        .from(FILM_ACTOR)
        .where(FILM_ACTOR.FILM_ID.eq(FILM.FILM_ID))
      ).as("actors").convertFrom(r -> r.map(mapping(Actor::new))),
      multiset(
        select(FILM_CATEGORY.category().NAME)
        .from(FILM_CATEGORY)
        .where(FILM_CATEGORY.FILM_ID.eq(FILM.FILM_ID))
      ).as("films").convertFrom(r -> r.map(Record1::value1))
   )
   .from(FILM)
   .orderBy(FILM.TITLE)

   //     vvvvvvvvvvvvvvvvvv grande finale
   .fetch(mapping(Film::new));

This time, we don’t need the Field.convertFrom() method. Just using the Records.mapping() auxiliary will be sufficient.

A More Complex Example

The previous example showed nesting of two independent collections, which is quite hard with classic JOIN based SQL or ORMs. How about a much more complex example, where we nest things 2 levels, one of which being an aggregation, even? The requirement is:

Give me all the films, the actors that played in the film, the categories that categorise the film, the customers that have rented the film, and all the payments per customer for that film

I won’t even show a JOIN based approach. Let’s dive directly into MULTISET and the also new, synthetic MULTISET_AGG aggregate function. Here’s how to do this with jOOQ. Now, check out that beautiful result type:

Result<Record4<
    String,                   // FILM.TITLE
    Result<Record2<
        String,               // ACTOR.FIRST_NAME
        String                // ACTOR.LAST_NAME
    >>,                       // "actors"
    Result<Record1<String>>,  // CATEGORY.NAME
    Result<Record4<
        String,               // CUSTOMER.FIRST_NAME
        String,               // CUSTOMER.LAST_NAME
        Result<Record2<
            LocalDateTime,    // PAYMENT.PAYMENT_DATE
            BigDecimal        // PAYMENT.AMOUNT
        >>, 
        BigDecimal            // "total"
    >>                        // "customers"
>> result = 
dsl.select(

        // Get the films
        FILM.TITLE,

        // ... and all actors that played in the film
        multiset(
            select(
                FILM_ACTOR.actor().FIRST_NAME,
                FILM_ACTOR.actor().LAST_NAME
            )
            .from(FILM_ACTOR)
            .where(FILM_ACTOR.FILM_ID.eq(FILM.FILM_ID))
        ).as("actors"),

        // ... and all categories that categorise the film
        multiset(
            select(FILM_CATEGORY.category().NAME)
            .from(FILM_CATEGORY)
            .where(FILM_CATEGORY.FILM_ID.eq(FILM.FILM_ID))
        ).as("categories"),

        // ... and all customers who rented the film, as well
        // as their payments
        multiset(
            select(
                PAYMENT.rental().customer().FIRST_NAME,
                PAYMENT.rental().customer().LAST_NAME,
                multisetAgg(
                    PAYMENT.PAYMENT_DATE,
                    PAYMENT.AMOUNT
                ).as("payments"),
                sum(PAYMENT.AMOUNT).as("total"))
            .from(PAYMENT)
            .where(PAYMENT
                .rental().inventory().FILM_ID.eq(FILM.FILM_ID))
            .groupBy(
                PAYMENT.rental().customer().CUSTOMER_ID,
                PAYMENT.rental().customer().FIRST_NAME,
                PAYMENT.rental().customer().LAST_NAME)
        ).as("customers")
    )
    .from(FILM)
    .where(FILM.TITLE.like("A%"))
    .orderBy(FILM.TITLE)
    .limit(5)
    .fetch();

You’ll be using var, of course, rather than denoting this insane type but I wanted to denote the type explicitly for the sake of the example.

Note again how implicit joins were really helpful here since we wanted to aggregate all the payments per customer, we can just select from PAYMENT and group by the payment’s PAYMENT.rental().customer(), as well as correlate the subquery by PAYMENT.rental().inventory().FILM_ID without any extra effort.

The executed SQL looks like this, where you can see the generated implicit joins (run it on your PostgreSQL Sakila database!):

select
  film.title,
  (
    select coalesce(
      jsonb_agg(jsonb_build_object(
        'first_name', t.first_name,
        'last_name', t.last_name
      )),
      jsonb_build_array()
    )
    from (
      select alias_78509018.first_name, alias_78509018.last_name
      from (
        film_actor
          join actor as alias_78509018
            on film_actor.actor_id = alias_78509018.actor_id
        )
      where film_actor.film_id = film.film_id
    ) as t
  ) as actors,
  (
    select coalesce(
      jsonb_agg(jsonb_build_object('name', t.name)),
      jsonb_build_array()
    )
    from (
      select alias_130639425.name
      from (
        film_category
          join category as alias_130639425
            on film_category.category_id = 
               alias_130639425.category_id
        )
      where film_category.film_id = film.film_id
    ) as t
  ) as categories,
  (
    select coalesce(
      jsonb_agg(jsonb_build_object(
        'first_name', t.first_name,
        'last_name', t.last_name,
        'payments', t.payments,
        'total', t.total
      )),
      jsonb_build_array()
    )
    from (
      select
        alias_63965917.first_name,
        alias_63965917.last_name,
        jsonb_agg(jsonb_build_object(
          'payment_date', payment.payment_date,
          'amount', payment.amount
        )) as payments,
        sum(payment.amount) as total
      from (
        payment
          join (
            rental as alias_102068213
              join customer as alias_63965917
                on alias_102068213.customer_id = 
                   alias_63965917.customer_id
              join inventory as alias_116526225
                on alias_102068213.inventory_id = 
                   alias_116526225.inventory_id
          )
            on payment.rental_id = alias_102068213.rental_id
        )
      where alias_116526225.film_id = film.film_id
      group by 
        alias_63965917.customer_id, 
        alias_63965917.first_name, 
        alias_63965917.last_name
    ) as t
  ) as customers
from film
where film.title like 'A%'
order by film.title
fetch next 5 rows only

The result, in JSON, now looks something like this:

[
  {
    "title": "ACADEMY DINOSAUR",
    "actors": [
      {
        "first_name": "PENELOPE",
        "last_name": "GUINESS"
      },
      {
        "first_name": "CHRISTIAN",
        "last_name": "GABLE"
      },
      {
        "first_name": "LUCILLE",
        "last_name": "TRACY"
      },
      ...
    ],
    "categories": [{ "name": "Documentary" }],
    "customers": [
      {
        "first_name": "SUSAN",
        "last_name": "WILSON",
        "payments": [
          {
            "payment_date": "2005-07-31T22:08:29",
            "amount": 0.99
          }
        ],
        "total": 0.99
      },
      {
        "first_name": "REBECCA",
        "last_name": "SCOTT",
        "payments": [
          {
            "payment_date": "2005-08-18T18:36:16",
            "amount": 0.99
          }
        ],
        "total": 0.99
      },
      ...

That’s it. Nesting collections in arbitrary ways is completely effortless and intuitive. No N+1, no deduplication. Just declare the results in exactly the form you require in your client.

There is no other way to pull off this complexity with such ease, than letting your RDBMS do the heavy lifting of planning and running such a query, and letting jOOQ do the mapping.

Conclusion

We had this kind of functionality all along. We just never used it, or not enough. Why? Because client APIs did not make it accessible enough. Because RDBMS didn’t agree on syntax enough.

That’s now over. jOOQ uses standard SQL MULTISET syntax in its DSL API, enhances it with the synthetic MULTISET_AGG aggregate function, the way all RDBMS should have implemented it (go Informix, Oracle). We can wait for another 40 years for the other RDBMS to implement this, or we just use jOOQ today.

And, I cannot stress this enough:

  • This is all type safe
  • There is no reflection
  • The nesting is done in the database using SQL (via SQL/XML or SQL/JSON for now)
  • … So, execution planners can optimise your entire query
  • … No extra columns or extra queries or other extra work is performed in the database

This works on all dialects that have either SQL/XML or SQL/JSON support (or both), including the major popular dialects:

  • MySQL
  • Oracle
  • PostgreSQL
  • SQL Server

And it is offered under the usual license terms of jOOQ. So, happy nesting of collections.

Addendum: Using SQL/XML or SQL/JSON directly

You may be tempted to use this everywhere. And you rightfully do so. But beware of this, if your SQL client is consuming XML or JSON directly, there’s no need to use MULTISET. Use jOOQ’s native SQL/XML or SQL/JSON support that was introduced in jOOQ 3.14. That way, you won’t convert from JSON to jOOQ results to JSON, but stream the JSON (or XML) to your frontend directly.

Quickly Trying out jOOQ with Jbang!

jbang is a relatively new utility that …

… lets students, educators and professional developers create, edit and run self-contained source-only Java programs with unprecedented ease.

Sounds exciting. How does it work with jOOQ? Very easy! Set it up like this (other installation options here):

curl -Ls https://sh.jbang.dev | bash -s - app setup

And then, play around with this jOOQ example:

git clone https://github.com/jOOQ/jbang-example
cd jbang-example
jbang Example.java

And you should see something like this:

[jbang] Resolving dependencies...
[jbang]     Resolving org.jooq:jooq:RELEASE...Done
[jbang]     Resolving com.h2database:h2:RELEASE...Done
[jbang] Dependencies resolved
[jbang] Building jar...
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF8
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF8
Juni 24, 2021 5:06:35 PM org.jooq.tools.JooqLogger info
INFO:

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@  @@        @@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@        @@@@@@@@@@
@@@@@@@@@@@@@@@@  @@  @@    @@@@@@@@@@
@@@@@@@@@@  @@@@  @@  @@    @@@@@@@@@@
@@@@@@@@@@        @@        @@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@        @@        @@@@@@@@@@
@@@@@@@@@@    @@  @@  @@@@  @@@@@@@@@@
@@@@@@@@@@    @@  @@  @@@@  @@@@@@@@@@
@@@@@@@@@@        @@  @  @  @@@@@@@@@@
@@@@@@@@@@        @@        @@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@  @@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  Thank you for using jOOQ 3.14.4

Query:
select
  "alias_99346637"."FIRST_NAME",
  "alias_99346637"."LAST_NAME",
  "alias_12744250"."TITLE",
  count(distinct "BOOK_TO_BOOK_STORE"."BOOK_ID") over (partition by "alias_12744250"."AUTHOR_ID") "books written by author",
  count(distinct "BOOK_TO_BOOK_STORE"."NAME") over (partition by "alias_12744250"."AUTHOR_ID") "bookstores listing author",
  "alias_86071072"."CD",
  "alias_134125752"."NAME"
from (
  "BOOK_TO_BOOK_STORE"
    join (
      "BOOK" "alias_12744250"
        join "AUTHOR" "alias_99346637"
          on "alias_12744250"."AUTHOR_ID" = "alias_99346637"."ID"
        join "LANGUAGE" "alias_86071072"
          on "alias_12744250"."LANGUAGE_ID" = "alias_86071072"."ID"
    )
      on "BOOK_TO_BOOK_STORE"."BOOK_ID" = "alias_12744250"."ID"
    join "BOOK_STORE" "alias_134125752"
      on "BOOK_TO_BOOK_STORE"."NAME" = "alias_134125752"."NAME"
  )
order by
  1,
  2,
  3

Result:
+----------+---------+------------+-----------------------+-------------------------+----+-------------------------+
|FIRST_NAME|LAST_NAME|TITLE       |books written by author|bookstores listing author|CD  |NAME                     |
+----------+---------+------------+-----------------------+-------------------------+----+-------------------------+
|George    |Orwell   |1984        |                      2|                        2|en  |Orell Füssli             |
|George    |Orwell   |1984        |                      2|                        2|en  |Ex Libris                |
|George    |Orwell   |Animal Farm |                      2|                        2|en  |Orell Füssli             |
|Paulo     |Coelho   |O Alquimista|                      1|                        3|pt  |Orell Füssli             |
|Paulo     |Coelho   |O Alquimista|                      1|                        3|pt  |Ex Libris                |
|Paulo     |Coelho   |O Alquimista|                      1|                        3|pt  |Buchhandlung im Volkshaus|
+----------+---------+------------+-----------------------+-------------------------+----+-------------------------+

No Maven, Gradle, or any other hassles involved. All dependencies are resolved automatically, compilation is done as well.

To re-generate the jOOQ generated code, if you modify the code generation configuration, or the database schema changes, just run:

jbang codegen@jooq db.xml

You can also use the other 2 CLI APIs very simply, from jOOQ, e.g. the DiffCLI:

jbang diff@jooq -T MYSQL -1 "create table t (i int);" -2 "create table t (i int, j int);"

To get this output:

alter table t add j int null;

Or, the ParserCLI

jbang parser@jooq -T MYSQL -s "create table t (i int generated always as identity);"

And you’ll get:

create table t (i int not null auto_increment);

Want to work with the commercial distributions? Assuming you’ve already downloaded the appropriate trial or pro distribution from here: https://www.jooq.org/download/versions

Then, just add the usual suffix to your script from the jbang-catalog (the full and up-to-date catalog can be seen here)

jbang parser-trial@jooq -T SQLSERVER -s "create table t (i int generated always as identity);"

And now, you can also access the commerical dialects to get:

create table t (i int identity(1, 1) not null);

Available suffixes include:

  • -trial
  • -trial-java-8
  • -trial-java-11 (starting from jOOQ 3.15)
  • -pro
  • -pro-java-8
  • -pro-java-11 (starting from jOOQ 3.15)

For more installation, please refer to the jbang documentation:

How to Prevent Execution Plan Troubles when Querying Skewed Data, with jOOQ

One of the biggest advantages of using jOOQ is that you can change all of your complex application’s generated SQL with just a few lines of code. In this article, we’ll look into how to solve some common bind peeking issues just like that, without touching your application code, without the need to explain this advanced SQL performance issue to every team member, and best of all: for good.

What are Bind Values Good For?

First of all, bind variables are a good thing. They:

The latter bullet doesn’t affect all dialects, just like this article doesn’t. Commercial DBMS like Oracle, SQL Server, etc. ship with a powerful execution plan cache. They are designed to run thousands of distinct queries with very complex plans. Planning these queries takes time (I’ve seen Oracle SQL being planned for 5 seconds!) and you don’t want the DBMS to re-execute this planning work every time the query is run, which may be thousands of times per second.

Usually, this execution plan cache takes the SQL string (or a hash of it), and associates meta data like alternative possible execution plans to it. When the SQL string changes, the cache lookup fails and the “new” query has to be planned again.

I say “new”, because it might be the “same” query to the user, even if the string is different. For example:

SELECT * FROM book WHERE id = 1;
SELECT * FROM book WHERE id = 2;

Now we have two times the “same” query, but each of them are “new”. Oracle will plan them both afresh. So, we use bind variables, instead:

SELECT * FROM book WHERE id = ?;
SELECT * FROM book WHERE id = ?;

What is the Bind Peeking Issue?

In some cases, the cached plan is not optimal. This is the case when the actual bind value is significant to the planning, e.g. the value 1 would produce a vastly different plan than the value 2, or much more likely the value DELETED produces a different plan from PROCESSED or NEW.

This problem has been discussed in our previous blog post Why You Should Design Your Database to Optimise for Statistics.

“Bind Peeking” is a technique by the Oracle database (it is done by others, too, but maybe not called “Bind Peeking”) to have a “peek” at the bind variable to get a more accurate plan than the average plan if we don’t know the bind value. This can go both ways, good or bad, so there were numerous fixes / patches / workarounds in previous Oracle versions. Some interesting articles on the topic:

Databases slowly get to truly adaptive query execution models where execution plans can be fixed in-flight when the estimates were clearly wrong. Db2 is quite strong at this, and Oracle is getting better.

But even then, some times planners get it wrong, simply because they can’t reasonably estimate the cardinalities produced by a simple predicate like

WHERE x = ?

… just because the overall query is very complex and some SQL transformation did not apply.

Preventing the Problem by Avoiding Bind Values

Again. Please use bind values by default. They’re a good thing by default. Not all data is as skewed as the one I presented in my other blog post. But some data is almost always skewed: Enumeration types.

When you have an enum like:

enum ProcessingState {
  NEW,
  PROCESSING,
  EXECUTED,
  DELETED
}

Or in PostgreSQL:

CREATE TYPE processing_state AS ENUM (
  'new',
  'processing',
  'executed',
  'deleted'
);

Or even just encoded as a CHECK constraint:

CREATE TABLE transaction (
  -- ...
  processing_state VARCHAR(10) CHECK (processing_state IN (
    'new',
    'processing',
    'executed',
    'deleted'
  ))
  -- ...
);

In that case, you will very likely have highly skewed data. For example, a quick query might yield:

SELECT processing_state, count(*)
FROM transaction
GROUP BY processing_state

Resulting in:

+------------------+----------+
| processing_state |    count |
+------------------+----------+
| new              |    10234 |
| processing       |       15 |
| executed         | 17581684 |
| deleted          |    83193 |
+------------------+----------+

Now, do you think you’ll profit from indexing PROCESSING_STATE equally, when looking for NEW or PROCESSING versus when looking for EXECUTED values? Do you want the same plans? Do you want an average plan, which might not use the index, when in fact you should use it (looking for PROCESSING)?

Not only that, but it’s also quite unlikely that your queries are so generic as that the individual PROCESSING_STATE values can be used interchangeably. For example, a query looking for DELETED states might be run by a housekeeping batch job that wants to remove the logically deleted transactions for good. It will never query anything other than DELETED states. So, might as well inline, right?

Now, if you write a query like this:

SELECT *
FROM transaction
WHERE processing_state = 'processing';

With jOOQ, you can create an “inline” on a per-query basis using DSL.inline("processing") (as opposed to DSL.val("processing"), which is used by default, implicitly:

// These generate a ? bind value
ctx.selectFrom(TRANSACTION)
   .where(TRANSACTION.PROCESSING_STATE.eq("processing"))
   .fetch();

ctx.selectFrom(TRANSACTION)
   .where(TRANSACTION.PROCESSING_STATE.eq(val("processing")))
   .fetch();

// This creates an inline 'processing' literal
ctx.selectFrom(TRANSACTION)
   .where(TRANSACTION.PROCESSING_STATE.eq(inline("processing")))
   .fetch();

As always, assuming this static import:

import static org.jooq.impl.DSL.*;

But now, you have to think of doing that every single time you query PROCESSING_STATE, and all the other similar columns.

Prevent it for Good

Much better, prevent it for good. You can create a very simple Binding like this:

class AlwaysInlineStringBinding implements Binding<String, String> {

    final Binding<?, String> delegate = VARCHAR.getBinding();

    @Override
    public Converter<String, String> converter() {
        return Converters.identity(String.class);
    }

    @Override
    public void sql(BindingSQLContext<String> ctx) 
    throws SQLException {
        ctx.render().visit(inline(ctx.value()));
    }

    @Override
    public void register(BindingRegisterContext<String> ctx) 
    throws SQLException {
        delegate.register(ctx);
    }

    // No need to set anything
    @Override
    public void set(BindingSetStatementContext<String> ctx) 
    throws SQLException {}

    @Override
    public void set(BindingSetSQLOutputContext<String> ctx) 
    throws SQLException {
        delegate.set(ctx);
    }

    @Override
    public void get(BindingGetResultSetContext<String> ctx) 
    throws SQLException {
        delegate.get(ctx);
    }

    @Override
    public void get(BindingGetStatementContext<String> ctx) 
    throws SQLException {
        delegate.get(ctx);
    }

    @Override
    public void get(BindingGetSQLInputContext<String> ctx) 
    throws SQLException {
        delegate.get(ctx);
    }
}

Or, starting from jOOQ 3.15, even simpler, and generic:

class AlwaysInlineStringBinding 
extends DefaultBinding<String, String> {
    public AlwaysInlineStringBinding() {
        super(DefaultBinding.binding(VARCHAR));
    }

    @Override
    public void sql(BindingSQLContext<String> ctx) 
    throws SQLException {
        ctx.render().visit(inline(ctx.value()));
    }

    // No need to set anything
    @Override
    public void set(BindingSetStatementContext<T> ctx) 
    throws SQLException {}
}

Or even generic:

class AlwaysInlineBinding<T> extends DefaultBinding<T, T> {
    public AlwaysInlineBinding(DataType<T> type) {
        super(DefaultBinding.binding(type));
    }

    @Override
    public void sql(BindingSQLContext<T> ctx) 
    throws SQLException {
        ctx.render().visit(inline(ctx.value()));
    }

    // No need to set anything
    @Override
    public void set(BindingSetStatementContext<T> ctx) 
    throws SQLException {}
}

All this does is generate inline values instead of the ? bind parameter marker, and skip setting any value to the JDBC PreparedStatement (or reactive R2DBC Statement, starting from jOOQ 3.15. This will work just the same!)

Try it very easily, yourself (using the jOOQ 3.15 version):

@Test
public void testAlwaysInlineBinding() {
    DSLContext ctx = DSL.using(DEFAULT);
    DataType<Integer> t = INTEGER.asConvertedDataType(
        new AlwaysInlineBinding<>(INTEGER));

    Field<Integer> i = field("i", INTEGER);
    Field<Integer> j = field("j", t);
    Param<Integer> a = val(1);
    Param<Integer> b = val(1, INTEGER.asConvertedDataType(
        new AlwaysInlineBinding<>(INTEGER)));

    // Bind value by default
    assertEquals("?", ctx.render(a));
    assertEquals("1", ctx.renderInlined(a));
    assertEquals("1", ctx.render(b));
    assertEquals("1", ctx.renderInlined(b));

    // Bind value by default in predicates
    assertEquals("i = ?", ctx.render(i.eq(a)));
    assertEquals("i = 1", ctx.renderInlined(i.eq(a)));
    assertEquals("i = 1", ctx.render(i.eq(b)));
    assertEquals("i = 1", ctx.renderInlined(i.eq(b)));
    assertEquals("i = ?", ctx.render(i.eq(1)));
    assertEquals("i = 1", ctx.renderInlined(i.eq(1)));

    // No more bind values in predicates!
    assertEquals("j = 1", ctx.render(j.eq(a)));
    assertEquals("j = 1", ctx.renderInlined(j.eq(a)));
    assertEquals("j = 1", ctx.render(j.eq(b)));
    assertEquals("j = 1", ctx.renderInlined(j.eq(b)));
    assertEquals("j = 1", ctx.render(j.eq(1)));
    assertEquals("j = 1", ctx.renderInlined(j.eq(1)));
}

Of course, instead of doing the above programmatically, you’ll attach this Binding to all relevant columns using the code generator’s forced type configuration

Conclusion

Please use bind values by default. Both in jOOQ and elsewhere. It’s a very good default.

But sometimes, your data is skewed, and you as a developer, you probably know that. In those cases, sometimes, “inline values” as we call them (or constants, literals, etc.) may be the better choice to help the optimiser get better at estimating. Even if the optimiser’s estimates are good the first time, the plan may switch in production for some weird reason, including some plans being purged because the cache is full, or the DBA clicked a button, or whatever.

And that’s when your query might be unnecessarily slow all of a sudden. No more need. When you have enum types, or similar, just use the above simple trick, apply to all of your schema where it makes sense, and forget this problem for good.

Side note:

Of course, the other way round is just as easy. When you have inline literals that you want to switch over to bind values, you can do it just the same way, e.g. when you use jOOQ’s parsing connection to translate between dialects, or to patch your wrong ORM-generated SQL!

Use ResultQuery.collect() to Implement Powerful Mappings

In our opinion, any Iterable<T> should offer a <R> collect(Collector<T, ?, R>) method to allow for transforming the the content to something else using standard JDK collectors, jOOλ collectors from org.jooq.lambda.Agg or your own.

When using jOOQ, you don’t have to wait for the JDK to finally add these useful utilities to the Iterable API. jOOQ’s ResultQuery<R> already implements Iterable<R>, and offers additional convenience like collect() on top of it.

For example, using a Java 16 record type:

record Book (int id, String title) {}

List<Book> books =
ctx.select(BOOK.ID, BOOK.TITLE)
   .from(BOOK)
   .collect(Collectors.mapping(
        r -> r.into(Book.class),
        Collectors.toList()
   ));

There are other ways to map things, but why not use a Collector. The best thing about Collector types is, they compose, type safely, and arbitrarily, almost like Stream pipelines.

I found a very interesting use-case recently on Stack Overflow. The problem there was that fetchGroups() is quite simple and not left-join aware, meaning that when an AUTHOR (parent) has no BOOK (child), instead of an empty list, there will be a list with a single NULL item:

Map<AuthorRecord, List<BookRecord>> result =
ctx.select()
   .from(AUTHOR)
   .leftJoin(BOOK).onKey()
   .fetchGroups(AUTHOR, BOOK);

The above works well for inner joins, but it doesn’t really make sense for left joins. We should fix this in jOOQ, of course (https://github.com/jOOQ/jOOQ/issues/11888), but using Collectors, you can already work around this problem today.

Simply write

Map<AuthorRecord, List<BookRecord>> result =
ctx.select()
   .from(AUTHOR)
   .leftJoin(BOOK).onKey()
   .collect(groupingBy(
        r -> r.into(AUTHOR), 
        filtering(
            r -> r.get(BOOK.ID) != null, 
            mapping(
                r -> r.into(BOOK), 
                toList()
            )
        )
    ));

// All assuming the usual static imports:
import static org.jooq.impl.DSL.*;
import static com.example.generated.Tables.*;
import static java.util.stream.Collectors.*;

Step by step:

  1. Group results by AUTHOR, mapping keys to AuthorRecord just like jOOQ’s fetchGroups()
  2. For each AUTHOR filter out those BOOK records whose BOOK.ID is null. Given that BOOK.ID is the primary key, the only reason why it could be null is because of the left join
  3. Map values to BookRecord, just like jOOQ’s fetchGroups()
  4. Collect the child records into a list.

And you’re done. Just like when you use the ResultQuery as an Iterable in a foreach loop, the collect() call executes your query automatically, managing all resources, bypassing the intermediate Result data structure, which isn’t needed here.

How to Get an RDBMS Server Version with SQL

Do you need to know what RDBMS Server version you’re on, and you only have SQL at your disposal? No problem. Most RDBMS provide you with that information in some form of meta data table.

Here’s how:

-- CockroachDB
select version();

-- Db2
select service_level from table (sysproc.env_get_inst_info()) t

-- Derby
select getdatabaseproductversion() from (values (1)) t (a);

-- Exasol
select param_value 
from exa_metadata 
where param_name = 'databaseProductVersion';

-- Firebird
select rdb$get_context('SYSTEM', 'ENGINE_VERSION')
from rdb$database;

-- H2
select h2version(); 

-- HANA
select * from m_database;

-- HSQLDB
select character_value
from information_schema.sql_implementation_info
where implementation_info_name = 'DBMS VERSION'

-- Informix
select dbinfo('version', 'full') from systables where tabid = 1;

-- MariaDB
select version();

-- MemSQL (SingleStore)
select version();

-- MySQL
select version();

-- Oracle
select * from v$version;

-- PostgreSQL
select version();

-- Snowflake
select current_version();

-- SQL Server
select @@version;

-- SQLite
select sqlite_version();

-- Teradata
select infodata from dbc.dbcinfov where infokey = 'VERSION'

Missing this info for your own RDBMS? Feel free to comment.

Use IN List Padding to Your JDBC Application to Avoid Cursor Cache Contention Problems

A problem few developers are aware of is the possibility of running into “cursor cache contention” or “execution plan cache contention” problems when using IN lists in SQL. The problem that is described in lengths in previous articles, can be summarised as this.

All of these are distinct SQL queries and need to be parsed / planned / cached as possibly distinct execution plans in RDBMS that have strong plan caches (e.g. Db2, Oracle, SQL Server):

SELECT * FROM t WHERE id IN (?);
SELECT * FROM t WHERE id IN (?, ?);
SELECT * FROM t WHERE id IN (?, ?, ?);
SELECT * FROM t WHERE id IN (?, ?, ?, ?);
SELECT * FROM t WHERE id IN (?, ?, ?, ?, ?);

While this is never a problem on developer machines, this can produce significant problems in production. I’ve seen this take down entire Oracle instances during peak loads. While RDBMS vendors should work on avoiding the severe problems this can cause, you can work around it using a trick that we invented at jOOQ (and Hibernate also has it now):

IN list padding

The trick is very simple. Just “pad” your IN lists to the nearest power of 2, and repeat the last value until the end:

SELECT * FROM t WHERE id IN (?);                      -- Left as it is
SELECT * FROM t WHERE id IN (?, ?);                   -- Left as it is
SELECT * FROM t WHERE id IN (?, ?, ?, ?);             -- Padded 3 to 4
SELECT * FROM t WHERE id IN (?, ?, ?, ?);             -- Left as it is
SELECT * FROM t WHERE id IN (?, ?, ?, ?, ?, ?, ?, ?); -- Padded 5 to 8

It’s really a hack and there are better solutions to avoiding this problem, including using arrays or temporary tables, but your production system may be down and you need a quick fix.

jOOQ has supported IN list padding for years now, since jOOQ 3.9 (late 2016), but with the relatively new parser and the ParsingConnection, you can now also apply this technique to any arbitrary SQL query in your non-jOOQ based system. Here’s a simple example:

// Any arbitrary JDBC Connection is wrapped by jOOQ here and replaced
// by a "ParsingConnection", which is also a JDBC Connection
DSLContext ctx = DSL.using(connection);
ctx.settings().setInListPadding(true);
Connection c = ctx.parsingConnection();

// Your remaining code is left untouched. It is unaware of jOOQ
for (int i = 0; i < 10; i++) {
    try (PreparedStatement s = c.prepareStatement(

        // This alone is reason enough to use jOOQ instead, 
        // but one step at a time :)
        "select 1 from dual where 1 in (" +
            IntStream.rangeClosed(0, i)
                     .mapToObj(x -> "?")
                     .collect(Collectors.joining(", ")) +
        ")")
    ) {
        for (int j = 0; j <= i; j++)
            s.setInt(j + 1, j + 1);

        try (ResultSet rs = s.executeQuery()) {
            while (rs.next())
                System.out.println(rs.getInt(1));
        }
    }
}

The above example just generates and runs 10 queries of this form:

select 1 from dual where 1 in (?)
select 1 from dual where 1 in (?, ?)
select 1 from dual where 1 in (?, ?, ?)
select 1 from dual where 1 in (?, ?, ?, ?)
select 1 from dual where 1 in (?, ?, ?, ?, ?)
select 1 from dual where 1 in (?, ?, ?, ?, ?, ?)
select 1 from dual where 1 in (?, ?, ?, ?, ?, ?, ?)
select 1 from dual where 1 in (?, ?, ?, ?, ?, ?, ?, ?)
select 1 from dual where 1 in (?, ?, ?, ?, ?, ?, ?, ?, ?)
select 1 from dual where 1 in (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)

But that’s not what is being executed. In the DEBUG logs, we can see the following:

Translating from         : select 1 from dual where 1 in (?)
Translating to           : select 1 from dual where 1 in (?)
Translating from         : select 1 from dual where 1 in (?, ?)
Translating to           : select 1 from dual where 1 in (?, ?)
Translating from         : select 1 from dual where 1 in (?, ?, ?)
Translating to           : select 1 from dual where 1 in (?, ?, ?, ?)
Translating from         : select 1 from dual where 1 in (?, ?, ?, ?)
Translating to           : select 1 from dual where 1 in (?, ?, ?, ?)
Translating from         : select 1 from dual where 1 in (?, ?, ?, ?, ?)
Translating to           : select 1 from dual where 1 in (?, ?, ?, ?, ?, ?, ?, ?)
Translating from         : select 1 from dual where 1 in (?, ?, ?, ?, ?, ?)
Translating to           : select 1 from dual where 1 in (?, ?, ?, ?, ?, ?, ?, ?)
Translating from         : select 1 from dual where 1 in (?, ?, ?, ?, ?, ?, ?)
Translating to           : select 1 from dual where 1 in (?, ?, ?, ?, ?, ?, ?, ?)
Translating from         : select 1 from dual where 1 in (?, ?, ?, ?, ?, ?, ?, ?)
Translating to           : select 1 from dual where 1 in (?, ?, ?, ?, ?, ?, ?, ?)
Translating from         : select 1 from dual where 1 in (?, ?, ?, ?, ?, ?, ?, ?, ?)
Translating to           : select 1 from dual where 1 in (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
Translating from         : select 1 from dual where 1 in (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
Translating to           : select 1 from dual where 1 in (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)

And just like that, our legacy application can run in production again, and you’ll have time to fix this more thoroughly.

Conclusion

While jOOQ is mostly an internal DSL for writing type safe, embedded SQL in Java, you can use it for a lot of other things too on any JDBC based application. The above example is using the ParsingConnection that can parse all your SQL statements and translate / transform them to anything else, including other dialects.

Never Again Forget to Call .execute() in jOOQ

jOOQ’s DSL, like any fluent API, has one big caveat. It’s very easy to forget to call .execute(). And when you do, chances are, you’re going to be staring at your code for minutes, because everything looks perfect:

ctx.insertInto(T)
   .columns(T.A, T.B)
   .values(1, 2);

Staring… staring… staring… Why is it not inserting that row?

“Aaaah, not again!!”

This is how it’s done:

ctx.insertInto(T)
   .columns(T.A, T.B)
   .values(1, 2)
   .execute();

In principle, this kind of mistake can happen with any fluent API. E.g. StringBuilder

sb.append("a").append("b"); // Not consuming the result

Or streams:

Stream.of(1, 2).peek(System.out::println); // Not so much peeking

But it usually doesn’t happen as much, because the difference to jOOQ is that

  • jOOQ’s DML statements (INSERT, UPDATE, DELETE, MERGE) and DDL statements (CREATE, ALTER, DROP, TRUNCATE), and a few other produce a side effect
  • That side effect is the only thing we care about. The result (the update count) is mostly irrelevant

And as such, we don’t care about the result of execute(), which is an int. No one forgets to call fetch() on a jOOQ ResultQuery:

ctx.select(T.A, T.B)
   .from(T); // Well duh

Because without calling fetch() or something similar, we’re not getting any results, and we want the results, just like with the StringBuilder or the Stream. But we don’t want the execute() result.

As such, even we, when writing jOOQ’s integration tests, occasionally forget to call this silly little method.

No more!

When it happened again this week…

… I finally created an issue to think about it: https://github.com/jOOQ/jOOQ/issues/11718. And I created an issue to wonder if JetBrains could do something about it: https://youtrack.jetbrains.com/issue/IDEA-265263

And they already can! Apart from the org.jetbrains.annotations.Contract annotation, which is there for precisely this reason, apparently, it’s also possible to mimick the JSR-305 @CheckReturnValue annotation on every method “whose return value needs to be checked” (i.e. a method that has no side effect, or whose side effect is to mutate only “this“).

I added this annotation, I added it to all the relevant jOOQ API, which was a bit of yak shaving (https://github.com/jOOQ/jOOQ/commit/f2b529a2305f8c5f8d037776687887a5acd50b11) and voilà

image

As you can see, IntelliJ now warns the user whenever they forget to consume the result of any of jOOQ’s DSL methods (by calling execute(), passing it to some method that consumes it, etc.)

Thanks again to Tagir Valeev from JetBrains for walking me through this and even improving the @Contract annotation, which jOOQ might use in a future version.