3.15.0 Release with Support for R2DBC, Nested ROW, ARRAY, and MULTISET types, 5 new SQL dialects, CREATE PROCEDURE, FUNCTION, and TRIGGER support and Much More

R2DBC

What a lot of users have been waiting for: jOOQ 3.15 is reactive, thanks to the new native R2DBC integration. Recent versions already implemented the reactive streams Publisher SPI, but now we’re not cheating anymore. We’re not longer blocking. Just wrap your R2DBC ConnectionFactory configured jOOQ query in a Flux (or any reactive streams API of your choice), and see what happens.

Flux.from(ctx.select(BOOK.TITLE).from(BOOK));

Both blocking (via JDBC) and non-blocking (via R2DBC) can work side-by-side, allowing users to quickly a query between the two execution models without anychanges to the query building logic.

Projecting ROW types, ARRAY of ROW Types, and MULTISETS

After having implemented standard SQL/XML and SQL/JSON support in jOOQ 3.14, another major milestone in taking SQL to the next level is now available as anexperimental feature: Nesting collections using the standard SQL MULTISET operator.

The operator is currently emulated using SQL/XML or SQL/JSON. The resulting documents are parsed again when fetching them from JDBC. Future versions will also provide native support (Informix, Oracle), and emulations using ARRAY (various dialects, including PostgreSQL).

Imagine this query against the Sakila database (https://www.jooq.org/sakila):

var result =
ctx.select(
      FILM.TITLE,
      multiset(
        select(ACTOR.FIRST_NAME, ACTOR.LAST_NAME)
        .from(ACTOR)
        .join(FILM_ACTOR).using(ACTOR.ACTOR_ID)
        .where(FILM_ACTOR.FILM_ID.eq(FILM.FILM_ID))
      ).as("actors"),
      multiset(
        select(CATEGORY.NAME)
        .from(CATEGORY)
        .join(FILM_CATEGORY).using(CATEGORY.CATEGORY_ID)
        .where(FILM_CATEGORY.FILM_ID.eq(FILM.FILM_ID))
      ).as("films")
   )
   .from(FILM)
   .orderBy(FILM.TITLE)
   .fetch();

You’re really going to love Java 10’s var keyword for these purposes. What’s the type of result? Exactly:

Result<Record3<
    String, 
    Result<Record2<String, String>>, 
    Result<Record1<String>>
>>

It contains:

+---------------------------+--------------------------------------------------+---------------+
|title                      |actors                                            |films          |
+---------------------------+--------------------------------------------------+---------------+
|ACADEMY DINOSAUR           |[(PENELOPE, GUINESS), (CHRISTIAN, GABLE), (LUCI...|[(Documentary)]|
|ACE GOLDFINGER             |[(BOB, FAWCETT), (MINNIE, ZELLWEGER), (SEAN, GU...|[(Horror)]     |
|ADAPTATION HOLES           |[(NICK, WAHLBERG), (BOB, FAWCETT), (CAMERON, ST...|[(Documentary)]|
 ...

Two collections were nested in a single query without producing any unwanted cartesian products and duplication of data. And stay tuned, we’ve added more goodies! See this article on how to map the above structural type to your nominal types (e.g. Java 16 records) in a type safe way, without reflection!

More info here:

New Dialects

We’ve added support for 5 (!) new SQLDialect’s. That’s unprecedented for any previous minor release. The new dialects are:

  • BIGQUERY
  • EXASOL
  • IGNITE
  • JAVA
  • SNOWFLAKE

Yes, there’s an experimental “JAVA” dialect. It’s mostly useful if you want to translate your native SQL queries to jOOQ using https://www.jooq.org/translate, and it cannot be executed. In the near future, we might add SCALA and KOTLIN as well, depending on demand.

BigQuery and Snowflake were long overdue by popular vote. The expedited EXASOL support has been sponsored by a customer, which is a great reminder that this is always an option. You need something more quickly? We can make it happen, even if the feature isn’t very popular on the roadmap.

Many other dialects have been brought up to date, including REDSHIFT, HANA, VERTICA, and two have been deprecated: INGRES and ORACLE10G, as these grow less and less popular.

Drop Java 6/7 support for Enterprise Edition, require Java 11 in OSS Edition

We’re cleaning up our support for old dependencies and features. Starting with jOOQ 3.12, we offered Java 6 and 7 support only to jOOQ Enterprise Edition customers. With jOOQ 3.15, this support has now been removed, and Java 8 is the new baseline for commercial editions, Java 11 for the jOOQ Open Source Edition, meaning the OSS Edition is now finally modular, and we get access to little things like the Flow API (see R2DBC) and @Deprecate(forRemoval, since).

Upgrading to Java 8 allows for exciting new improvements to our internals, as we can finally use default methods, lambdas, generalised target type inference, effectively final, diamond operators, try-with-resources, string switches, and what not. Improving our code base leads to dog fooding, and that again leads to new features for you. For example, we’ve put a lot of emphasis on ResultQuery.collect(), refactoring internals: https://blog.jooq.org/2021/05/17/use-resultquery-collect-to-implement-powerful-mappings/

There are new auxiliary types, like org.jooq.Rows and org.jooq.Records for more functional transformation convenience. More functions mean less loops, and also less ArrayList allocations.

At the same time, we’ve started building a Java 17 ready distribution for the commercial editions, which unlocks better record type support.

Refactoring of ResultQuery to work with DML

With all the above goodies related to Java 8, and a more functional usage of jOOQ, we’ve also finally refactored our DML statement type hierarchy (INSERT,UPDATE, DELETE), to let their respective RETURNING clauses return an actual ResultQuery. That means you can now stream(), collect(), fetchMap() and subscribe() (via R2DBC) to your DML statements, and even put them in the WITH clause (in PostgreSQL).

Massive improvements to the parser / translator use case

jOOQ’s secondary value proposition is to use its parser and translator, instead of the DSL API, which is also available for free on our website: https://www.jooq.org/translate

With increasing demand for this product, we’ve greatly improved the experience:

  • The ParsingConnection no longer experimental
  • Batch is now possible
  • We’ve added a cache for input/output SQL string pairs to heavily speed up the integration
  • We’re now delaying bind variable type inference to use actual PreparedStatement information. This produces more accurate results, especially when data types are not known to the parser.
  • A new ParseListener SPI allows for hooking into the parser and extend it with custom syntax support for column, table, and predicate expressions.

CREATE PROCEDURE, FUNCTION, TRIGGER and more procedural instructions

Over the recent releases, we’ve started working on procedural language extensions for the commercial distributions. In addition to creating anonymous blocks, we now also support all lifecycle management DDL for procedures, functions, and triggers, which can contain procedural language logic.

This is great news if you’re supporting multiple RDBMS and want to move some more data processing logic to the server side in a vendor agnostic way.

Explicit JDBC driver dependencies to avoid reflection

To get AOP ready, we’re slowly removing internal reflection usage, meaning we’re experimenting with an explicit JDBC driver build-time dependency. This currently affects:

  • Oracle
  • PostgreSQL
  • SQL Server

Only drivers available from Maven Central have been added as dependency so far.

Full release notes here.

jOOQ 3.15’s New Multiset Operator Will Change How You Think About SQL

This is how SQL should have been used all along.

They called it The Third Manifesto, ORDBMS, or other things. Regrettably, it never really took off. Because most vendors didn’t adopt it. And those who did, didn’t agree on syntax.

But this is about to change. Thanks to the now ubiquitous SQL/JSON support (which jOOQ 3.14 has already covered), we can now emulate the most powerful ORDBMS feature that you will want to use everywhere: Nested collections!

How We Used to do Things: With Joins

We’re going to be using the Sakila database for this example. It’s a DVD rental store with things like ACTOR, FILM, CATEGORY (of films) and other nice relational things. Let’s write a query for this requirement

Get me all the films with their actors and their categories

Classically, we’d go ahead and use jOOQ to write:

ctx.select(
      FILM.TITLE,
      ACTOR.FIRST_NAME,
      ACTOR.LAST_NAME,
      CATEGORY.NAME
   )
   .from(ACTOR)
   .join(FILM_ACTOR)
     .on(ACTOR.ACTOR_ID.eq(FILM_ACTOR.ACTOR_ID))
   .join(FILM)
     .on(FILM_ACTOR.FILM_ID.eq(FILM.FILM_ID))
   .join(FILM_CATEGORY)
     .on(FILM.FILM_ID.eq(FILM_CATEGORY.FILM_ID))
   .join(CATEGORY)
     .on(FILM_CATEGORY.CATEGORY_ID.eq(CATEGORY.CATEGORY_ID))
   .orderBy(1, 2, 3, 4)
   .fetch();

And the result? Not so nice. A denormalised, flat table containing tons of repetition:

+----------------+----------+---------+-----------+
|title           |first_name|last_name|name       |
+----------------+----------+---------+-----------+
|ACADEMY DINOSAUR|CHRISTIAN |GABLE    |Documentary|
|ACADEMY DINOSAUR|JOHNNY    |CAGE     |Documentary|
|ACADEMY DINOSAUR|LUCILLE   |TRACY    |Documentary|
|ACADEMY DINOSAUR|MARY      |KEITEL   |Documentary|
|ACADEMY DINOSAUR|MENA      |TEMPLE   |Documentary|
|ACADEMY DINOSAUR|OPRAH     |KILMER   |Documentary|
|ACADEMY DINOSAUR|PENELOPE  |GUINESS  |Documentary|
|ACADEMY DINOSAUR|ROCK      |DUKAKIS  |Documentary|
|ACADEMY DINOSAUR|SANDRA    |PECK     |Documentary|
|ACADEMY DINOSAUR|WARREN    |NOLTE    |Documentary|
|ACE GOLDFINGER  |BOB       |FAWCETT  |Horror     |
|ACE GOLDFINGER  |CHRIS     |DEPP     |Horror     |
|ACE GOLDFINGER  |MINNIE    |ZELLWEGER|Horror     |
 ...

If we don’t consume this result unmodified as it is (e.g. when displaying tabular data to a user), we’d then go and de-duplicate things, shoehorning them back again into some nested data structures (e.g. for consumption by some JSON based UI), and spending hours trying to untangle the cartesian products between the 2 nested collections FILM -> ACTOR and FILM -> CATEGORY (because ACTOR and CATEGORY now created a cartesian product, which we didn’t want!)

In the worst case, we don’t even notice! This example database only has 1 category per film, but it is designed to support multiple categories.

jOOQ can help with that deduplication, but just look at the number of questions for jOOQ many to many on Stack Overflow! You’ll probably still have to write at least 2 queries to separate the nested collections.

ENTER the Stage: Multiset

The standard SQL <multiset value constructor> operator allows for collecting the data from a correlated subquery into a nested data structure, a MULTISET. Everything in SQL is a MULTISET, so the operator isn’t too surprising. But the nesting is where it shines. The previous query can now be re-written in jOOQ as follows:

var result =
dsl.select(
      FILM.TITLE,
      multiset(
        select(
          FILM_ACTOR.actor().FIRST_NAME, 
          FILM_ACTOR.actor().LAST_NAME)
        .from(FILM_ACTOR)
        .where(FILM_ACTOR.FILM_ID.eq(FILM.FILM_ID))
      ).as("actors"),
      multiset(
        select(FILM_CATEGORY.category().NAME)
        .from(FILM_CATEGORY)
        .where(FILM_CATEGORY.FILM_ID.eq(FILM.FILM_ID))
      ).as("films")
   )
   .from(FILM)
   .orderBy(FILM.TITLE)
   .fetch();

Note, it’s not relevant to this task, but I’m using jOOQ’s type safe implicit to-one join feature, which further helps taming the joins, syntactically. A matter of taste.

How to read this query? Easy:

  • Get all the films
  • For each FILM, get all the actors as a nested collection
  • For each FILM, get all the categories as a nested collection

You’re going to like Java 10’s var keyword even more after this :) Because what type is result? It’s of this type:

Result<Record3<
  String,                          // FILM.TITLE
  Result<Record2<String, String>>, // ACTOR.FIRST_NAME, ACTOR.LAST_NAME
  Result<Record1<String>>          // CATEGORY.NAME
>>

That’s quite something. Not too complex if you think of it. There’s a result with 3 columns

  • TITLE
  • A first nested result that has 2 string columns: ACTOR.FIRST_NAME and ACTOR.LAST_NAME
  • A second nested result that has 1 string column: CATEGORY.NAME

Using var or other type inference mechanisms, you don’t have to denote this type. Or even better (stay tuned): We’ll type-safely map the structural type to our nominal DTO type hierarchy, with just a few additional lines of code. I’ll explain that later.

What Does the Result Look Like?

Calling toString() on the above Result type yields something like this:

+---------------------------+--------------------------------------------------+---------------+
|title                      |actors                                            |films          |
+---------------------------+--------------------------------------------------+---------------+
|ACADEMY DINOSAUR           |[(PENELOPE, GUINESS), (CHRISTIAN, GABLE), (LUCI...|[(Documentary)]|
|ACE GOLDFINGER             |[(BOB, FAWCETT), (MINNIE, ZELLWEGER), (SEAN, GU...|[(Horror)]     |
|ADAPTATION HOLES           |[(NICK, WAHLBERG), (BOB, FAWCETT), (CAMERON, ST...|[(Documentary)]|
|AFFAIR PREJUDICE           |[(JODIE, DEGENERES), (SCARLETT, DAMON), (KENNET...|[(Horror)]     |
|AFRICAN EGG                |[(GARY, PHOENIX), (DUSTIN, TAUTOU), (MATTHEW, L...|[(Family)]     |
|AGENT TRUMAN               |[(KIRSTEN, PALTROW), (SANDRA, KILMER), (JAYNE, ...|[(Foreign)]    |
 ...

Notice how we’re back to getting each FILM.TITLE entry only once (no duplication), and nested in each row are the subquery results. There’s no denormalisation happening!

When calling result.formatJSON() with the appropriate formatting options, we’ll get this representation:

[
  {
    "title": "ACADEMY DINOSAUR",
    "actors": [
      {
        "first_name": "PENELOPE",
        "last_name": "GUINESS"
      },
      {
        "first_name": "CHRISTIAN",
        "last_name": "GABLE"
      },
      {
        "first_name": "LUCILLE",
        "last_name": "TRACY"
      },
      {
        "first_name": "SANDRA",
        "last_name": "PECK"
      },

      ...
    ],
    "films": [
      { "name": "Documentary" }
    ]
  },
  {
    "title": "ACE GOLDFINGER",
    "actors": [
      {
        "first_name": "BOB",
        "last_name": "FAWCETT"
      },
  ...

Calling result.formatXML() would produce this:

<result>
  <record>
    <title>ACADEMY DINOSAUR</title>
    <actors>
      <result>
        <record>
          <first_name>PENELOPE</first_name>
          <last_name>GUINESS</last_name>
        </record>
        <record>
          <first_name>CHRISTIAN</first_name>
          <last_name>GABLE</last_name>
        </record>
        <record>
          <first_name>LUCILLE</first_name>
          <last_name>TRACY</last_name>
        </record>
        <record>
          <first_name>SANDRA</first_name>
          <last_name>PECK</last_name>
        </record>
        ...
      </result>
    </actors>
    <films>
      <result>
        <record>
          <name>Documentary</name>
        </record>
      </result>
    </films>
  </record>
  <record>
    <title>ACE GOLDFINGER</title>
    <actors>
      <result>
        <record>
          <first_name>BOB</first_name>
          <last_name>FAWCETT</last_name>
        </record>
        ...

You get the idea!

What’s the Generated SQL?

Just turn on jOOQ’s DEBUG logging and observe a query like this one (in PostgreSQL):

select
  film.title,
  (
    select coalesce(
      jsonb_agg(jsonb_build_object(
        'first_name', t.first_name,
        'last_name', t.last_name
      )),
      jsonb_build_array()
    )
    from (
      select 
        alias_78509018.first_name, 
        alias_78509018.last_name
      from (
        film_actor
          join actor as alias_78509018
            on film_actor.actor_id = alias_78509018.actor_id
        )
      where film_actor.film_id = film.film_id
    ) as t
  ) as actors,
  (
    select coalesce(
      jsonb_agg(jsonb_build_object('name', t.name)),
      jsonb_build_array()
    )
    from (
      select alias_130639425.name
      from (
        film_category
          join category as alias_130639425
            on film_category.category_id = alias_130639425.category_id
        )
      where film_category.film_id = film.film_id
    ) as t
  ) as films
from film
order by film.title

The Db2, MySQL, Oracle, SQL Server versions would look similar. Just try it on your Sakila database installation. It runs fast, too.

Mapping the Results to DTOs

Now, I promised to get rid of that lengthy structural type with all the generics. Check this out!

We used to call them POJOs (Plain Old Java Objects). Then DTOs (Data Transfer Objects). Now records. Yes, let’s try some Java 16 records here. (Note, records aren’t required for these examples. Any POJOs with appropriate constructors would do).

Wouldn’t it be nice, if result was of this type

record Actor(String firstName, String lastName) {}
record Film(
  String title,
  List<Actor> actors,
  List<String> categories
) {}

List<Film> result = ...

Structural typing is essential to jOOQ and its type safe query system, but in your code, you probably don’t want to have those merge-conflict lookalike generics all the time, and even var won’t help you if your data needs to be returned from a method.

So, let’s transform our jOOQ query to one that produces List<Film>, step by step. We’re starting with the original query, untouched:

Result<Record3<
    String, 
    Result<Record2<String, String>>, 
    Result<Record1<String>>
>> result =
dsl.select(
      FILM.TITLE,
      multiset(
        select(
          FILM_ACTOR.actor().FIRST_NAME, 
          FILM_ACTOR.actor().LAST_NAME)
        .from(FILM_ACTOR)
        .where(FILM_ACTOR.FILM_ID.eq(FILM.FILM_ID))
      ).as("actors"),
      multiset(
        select(FILM_CATEGORY.category().NAME)
        .from(FILM_CATEGORY)
        .where(FILM_CATEGORY.FILM_ID.eq(FILM.FILM_ID))
      ).as("films")
   )
   .from(FILM)
   .orderBy(FILM.TITLE)
   .fetch();

Now, we’re going to map the Actor in the first MULTISET expression. This can be done as follows, using the new Field.convertFrom() convenience method, which allows to turn a Field<T> into any read-only Field<U> for ad-hoc usage. A simple example would be this:

record Title(String title) {}

// Use this field in any query
Field<Title> title = FILM.TITLE.convertFrom(Title::new);

It’s just an easy, new way to attach a read-only Converter to a Field for single usage, instead of doing that with the code generator.

Applied to the original query:

Result<Record3<
    String, 
    List<Actor>,  // A bit nicer already
    Result<Record1<String>>
>> result =
dsl.select(
      FILM.TITLE,
      multiset(
        select(
          FILM_ACTOR.actor().FIRST_NAME, 
          FILM_ACTOR.actor().LAST_NAME)
        .from(FILM_ACTOR)
        .where(FILM_ACTOR.FILM_ID.eq(FILM.FILM_ID))

      // Magic here: vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
      ).as("actors").convertFrom(r -> r.map(mapping(Actor::new))),

      multiset(
        select(FILM_CATEGORY.category().NAME)
        .from(FILM_CATEGORY)
        .where(FILM_CATEGORY.FILM_ID.eq(FILM.FILM_ID))
      ).as("films")
   )
   .from(FILM)
   .orderBy(FILM.TITLE)
   .fetch();

What are we doing here?

  • The method convertFrom() takes a lambda Result<Record2<String, String>> -> Actor.
  • The Result type is the usual jOOQ Result, which has a Result.map(RecordMapper<R, E>) method.
  • The mapping() method is the new Records.mapping(), which isn’t doing much, just turning a constructor reference of type Function2<String, String, Actor> into a RecordMapper, which can then be used to turn a Result<Record2<String, String>> into a List<Actor>.

And it type checks! Try it yourself. If you add a column to the multiset, you’ll get a compilation error. If you add/remove an attribute from the Actor record, you’ll get a compilation error. No reflection here, just declarative mapping of jOOQ results/records to custom List<UserType>. If you prefer the “old” reflection approach using jOOQ’s ubiquitous into() methods, you can still do that, too:

Result<Record3<
    String, 
    List<Actor>,  // A bit nicer already
    Result<Record1<String>>
>> result =
dsl.select(
      FILM.TITLE,
      multiset(
        select(
          FILM_ACTOR.actor().FIRST_NAME, 
          FILM_ACTOR.actor().LAST_NAME)
        .from(FILM_ACTOR)
        .where(FILM_ACTOR.FILM_ID.eq(FILM.FILM_ID))

      // Magic here: vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
      ).as("actors").convertFrom(r -> r.into(Actor.class))),

      multiset(
        select(FILM_CATEGORY.category().NAME)
        .from(FILM_CATEGORY)
        .where(FILM_CATEGORY.FILM_ID.eq(FILM.FILM_ID))
      ).as("films")
   )
   .from(FILM)
   .orderBy(FILM.TITLE)
   .fetch();

The result still type checks, but the conversion from Result<Record2<String, String>> to List<Actor> no longer does, it uses reflection.

Let’s continue. Let’s remove the clumsy category Result<Record1<String>>. We could’ve added another record, but in this case, a List<String> will suffice.

Result<Record3<
    String, 
    List<Actor>,
    List<String>  // Begone, jOOQ structural type!
>> result =
dsl.select(
      FILM.TITLE,
      multiset(
        select(
          FILM_ACTOR.actor().FIRST_NAME, 
          FILM_ACTOR.actor().LAST_NAME)
        .from(FILM_ACTOR)
        .where(FILM_ACTOR.FILM_ID.eq(FILM.FILM_ID))
      ).as("actors").convertFrom(r -> r.map(mapping(Actor::new))),
      multiset(
        select(FILM_CATEGORY.category().NAME)
        .from(FILM_CATEGORY)
        .where(FILM_CATEGORY.FILM_ID.eq(FILM.FILM_ID))

      // Magic.     vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
      ).as("films").convertFrom(r -> r.map(Record1::value1))
   )
   .from(FILM)
   .orderBy(FILM.TITLE)
   .fetch();

And finally, the outer-most Result<Record3<...>> to List<Film> conversion

List<Film> result =
dsl.select(
      FILM.TITLE,
      multiset(
        select(
          FILM_ACTOR.actor().FIRST_NAME, 
          FILM_ACTOR.actor().LAST_NAME)
        .from(FILM_ACTOR)
        .where(FILM_ACTOR.FILM_ID.eq(FILM.FILM_ID))
      ).as("actors").convertFrom(r -> r.map(mapping(Actor::new))),
      multiset(
        select(FILM_CATEGORY.category().NAME)
        .from(FILM_CATEGORY)
        .where(FILM_CATEGORY.FILM_ID.eq(FILM.FILM_ID))
      ).as("films").convertFrom(r -> r.map(Record1::value1))
   )
   .from(FILM)
   .orderBy(FILM.TITLE)

   //     vvvvvvvvvvvvvvvvvv grande finale
   .fetch(mapping(Film::new));

This time, we don’t need the Field.convertFrom() method. Just using the Records.mapping() auxiliary will be sufficient.

A More Complex Example

The previous example showed nesting of two independent collections, which is quite hard with classic JOIN based SQL or ORMs. How about a much more complex example, where we nest things 2 levels, one of which being an aggregation, even? The requirement is:

Give me all the films, the actors that played in the film, the categories that categorise the film, the customers that have rented the film, and all the payments per customer for that film

I won’t even show a JOIN based approach. Let’s dive directly into MULTISET and the also new, synthetic MULTISET_AGG aggregate function. Here’s how to do this with jOOQ. Now, check out that beautiful result type:

Result<Record4<
    String,                   // FILM.TITLE
    Result<Record2<
        String,               // ACTOR.FIRST_NAME
        String                // ACTOR.LAST_NAME
    >>,                       // "actors"
    Result<Record1<String>>,  // CATEGORY.NAME
    Result<Record4<
        String,               // CUSTOMER.FIRST_NAME
        String,               // CUSTOMER.LAST_NAME
        Result<Record2<
            LocalDateTime,    // PAYMENT.PAYMENT_DATE
            BigDecimal        // PAYMENT.AMOUNT
        >>, 
        BigDecimal            // "total"
    >>                        // "customers"
>> result = 
dsl.select(

        // Get the films
        FILM.TITLE,

        // ... and all actors that played in the film
        multiset(
            select(
                FILM_ACTOR.actor().FIRST_NAME,
                FILM_ACTOR.actor().LAST_NAME
            )
            .from(FILM_ACTOR)
            .where(FILM_ACTOR.FILM_ID.eq(FILM.FILM_ID))
        ).as("actors"),

        // ... and all categories that categorise the film
        multiset(
            select(FILM_CATEGORY.category().NAME)
            .from(FILM_CATEGORY)
            .where(FILM_CATEGORY.FILM_ID.eq(FILM.FILM_ID))
        ).as("categories"),

        // ... and all customers who rented the film, as well
        // as their payments
        multiset(
            select(
                PAYMENT.rental().customer().FIRST_NAME,
                PAYMENT.rental().customer().LAST_NAME,
                multisetAgg(
                    PAYMENT.PAYMENT_DATE,
                    PAYMENT.AMOUNT
                ).as("payments"),
                sum(PAYMENT.AMOUNT).as("total"))
            .from(PAYMENT)
            .where(PAYMENT
                .rental().inventory().FILM_ID.eq(FILM.FILM_ID))
            .groupBy(
                PAYMENT.rental().customer().CUSTOMER_ID,
                PAYMENT.rental().customer().FIRST_NAME,
                PAYMENT.rental().customer().LAST_NAME)
        ).as("customers")
    )
    .from(FILM)
    .where(FILM.TITLE.like("A%"))
    .orderBy(FILM.TITLE)
    .limit(5)
    .fetch();

You’ll be using var, of course, rather than denoting this insane type but I wanted to denote the type explicitly for the sake of the example.

Note again how implicit joins were really helpful here since we wanted to aggregate all the payments per customer, we can just select from PAYMENT and group by the payment’s PAYMENT.rental().customer(), as well as correlate the subquery by PAYMENT.rental().inventory().FILM_ID without any extra effort.

The executed SQL looks like this, where you can see the generated implicit joins (run it on your PostgreSQL Sakila database!):

select
  film.title,
  (
    select coalesce(
      jsonb_agg(jsonb_build_object(
        'first_name', t.first_name,
        'last_name', t.last_name
      )),
      jsonb_build_array()
    )
    from (
      select alias_78509018.first_name, alias_78509018.last_name
      from (
        film_actor
          join actor as alias_78509018
            on film_actor.actor_id = alias_78509018.actor_id
        )
      where film_actor.film_id = film.film_id
    ) as t
  ) as actors,
  (
    select coalesce(
      jsonb_agg(jsonb_build_object('name', t.name)),
      jsonb_build_array()
    )
    from (
      select alias_130639425.name
      from (
        film_category
          join category as alias_130639425
            on film_category.category_id = 
               alias_130639425.category_id
        )
      where film_category.film_id = film.film_id
    ) as t
  ) as categories,
  (
    select coalesce(
      jsonb_agg(jsonb_build_object(
        'first_name', t.first_name,
        'last_name', t.last_name,
        'payments', t.payments,
        'total', t.total
      )),
      jsonb_build_array()
    )
    from (
      select
        alias_63965917.first_name,
        alias_63965917.last_name,
        jsonb_agg(jsonb_build_object(
          'payment_date', payment.payment_date,
          'amount', payment.amount
        )) as payments,
        sum(payment.amount) as total
      from (
        payment
          join (
            rental as alias_102068213
              join customer as alias_63965917
                on alias_102068213.customer_id = 
                   alias_63965917.customer_id
              join inventory as alias_116526225
                on alias_102068213.inventory_id = 
                   alias_116526225.inventory_id
          )
            on payment.rental_id = alias_102068213.rental_id
        )
      where alias_116526225.film_id = film.film_id
      group by 
        alias_63965917.customer_id, 
        alias_63965917.first_name, 
        alias_63965917.last_name
    ) as t
  ) as customers
from film
where film.title like 'A%'
order by film.title
fetch next 5 rows only

The result, in JSON, now looks something like this:

[
  {
    "title": "ACADEMY DINOSAUR",
    "actors": [
      {
        "first_name": "PENELOPE",
        "last_name": "GUINESS"
      },
      {
        "first_name": "CHRISTIAN",
        "last_name": "GABLE"
      },
      {
        "first_name": "LUCILLE",
        "last_name": "TRACY"
      },
      ...
    ],
    "categories": [{ "name": "Documentary" }],
    "customers": [
      {
        "first_name": "SUSAN",
        "last_name": "WILSON",
        "payments": [
          {
            "payment_date": "2005-07-31T22:08:29",
            "amount": 0.99
          }
        ],
        "total": 0.99
      },
      {
        "first_name": "REBECCA",
        "last_name": "SCOTT",
        "payments": [
          {
            "payment_date": "2005-08-18T18:36:16",
            "amount": 0.99
          }
        ],
        "total": 0.99
      },
      ...

That’s it. Nesting collections in arbitrary ways is completely effortless and intuitive. No N+1, no deduplication. Just declare the results in exactly the form you require in your client.

There is no other way to pull off this complexity with such ease, than letting your RDBMS do the heavy lifting of planning and running such a query, and letting jOOQ do the mapping.

Conclusion

We had this kind of functionality all along. We just never used it, or not enough. Why? Because client APIs did not make it accessible enough. Because RDBMS didn’t agree on syntax enough.

That’s now over. jOOQ uses standard SQL MULTISET syntax in its DSL API, enhances it with the synthetic MULTISET_AGG aggregate function, the way all RDBMS should have implemented it (go Informix, Oracle). We can wait for another 40 years for the other RDBMS to implement this, or we just use jOOQ today.

And, I cannot stress this enough:

  • This is all type safe
  • There is no reflection
  • The nesting is done in the database using SQL (via SQL/XML or SQL/JSON for now)
  • … So, execution planners can optimise your entire query
  • … No extra columns or extra queries or other extra work is performed in the database

This works on all dialects that have either SQL/XML or SQL/JSON support (or both), including the major popular dialects:

  • MySQL
  • Oracle
  • PostgreSQL
  • SQL Server

And it is offered under the usual license terms of jOOQ. So, happy nesting of collections.

Addendum: Using SQL/XML or SQL/JSON directly

You may be tempted to use this everywhere. And you rightfully do so. But beware of this, if your SQL client is consuming XML or JSON directly, there’s no need to use MULTISET. Use jOOQ’s native SQL/XML or SQL/JSON support that was introduced in jOOQ 3.14. That way, you won’t convert from JSON to jOOQ results to JSON, but stream the JSON (or XML) to your frontend directly.

How to Write Multiset Conditions With Oracle VARRAY Types

Oracle is one of the few databases that implements the SQL standard ORDBMS extensions, which essentially allow for nested collections. Other databases that have these features to some extent are CUBRID, Informix, PostgreSQL. Oracle has two types of nested collections:

-- Nested tables
CREATE TYPE t1 AS TABLE OF VARCHAR2(10);
/

-- Varrays
CREATE TYPE t2 AS VARRAY(10) OF VARCHAR2(10);
/

The main difference at first is that a nested table can be of arbitrary size, whereas a varray has a fixed maximum size. Other than that, they behave in similar ways. When storing a nested collection in a table, there is another difference. Varrays can be inlined into the table just like any other data type, whereas nested tables have to be accompanied by an additional storage clause:

CREATE TABLE t (
  id NUMBER(10),
  t1 t1,
  t2 t2
)
NESTED TABLE t1 STORE AS t1_nt;

This is a minor hassle in terms of DDL. The runtime implications are more significant.

Multiset Conditions

The most important difference is the fact that all the useful multiset conditions are not available with varrays. For instance, consider running these statements:

INSERT INTO t VALUES (1, NULL, NULL);
INSERT INTO t VALUES (2, t1(), t2());
INSERT INTO t VALUES (
  3, 
  t1('abc', 'xyz', 'zzz'), 
  t2('abc', 'xyz', 'zzz')
);
INSERT INTO t VALUES (
  4, 
  t1('dup', 'dup', 'dup'), 
  t2('dup', 'dup', 'dup')
);

SELECT * FROM t WHERE 'abc' MEMBER OF t1;
SELECT * FROM t WHERE 'abc' MEMBER OF t2;

The result of these queries is:
ID  T1                        T2
-----------------------------------------------------
3   T1('abc', 'xyz', 'zzz')   T2('abc', 'xyz', 'zzz')

ORA-00932: inconsistent datatypes: expected UDT got TEST.T2
Bummer. The documentation is a bit unclear about this. It reads (emphasis mine):
he return value is TRUE if expr is equal to a member of the specified nested table or varray. The return value is NULL if expr is null or if the nested table is empty.
There is some explicit mention of varrays supporting these operations, but in most of the documentation, varrays are not mentioned. So, how can we write such operations with varrays? Here’s an list of translations of the nested table operator to the equivalent SQL expression for use with varrays. These are the multiset conditions:

IS A SET condition

In SQL, everything is a (partially ordered) multiset by default. Sometimes, however, we want to work with sets, i.e. a special type of multiset that has no duplicate values. We can easily check whether nested tables are sets (or whether they aren’t):

-- Nested table version
SELECT * FROM t WHERE t1 IS A SET;

-- Varray version
SELECT * 
FROM t 
WHERE t2 IS NOT NULL
AND (SELECT count(*) FROM TABLE(t2)) 
  = (SELECT count(DISTINCT column_value) FROM TABLE(t2));

The IS A SET operation yields UNKNOWN if the nested table is NULL, so we have to take that into account as well. If it isn’t NULL, we can count the total values in the varray and compare that with the total distinct values in the varray. The result is:
ID  T1                        T2
-----------------------------------------------------
2   T1()                      T2()
3   T1('abc', 'xyz', 'zzz')   T2('abc', 'xyz', 'zzz')

IS EMPTY condition

This predicate needs no explanation. It can be written as such:

-- Nested table version
SELECT * FROM t WHERE t1 IS EMPTY;

-- Varray version
SELECT * 
FROM t 
WHERE t2 IS NOT NULL
AND NOT EXISTS (
  SELECT * FROM TABLE (t2)
);

The result being:
ID  T1                 T2
---------------------------------------
2   T1()               T2()

MEMBER condition

This handy predicate can help check if a specific value is contained in a nested collection. It can be written as such:

-- Nested table version
SELECT * FROM t WHERE 'abc' MEMBER OF t1;

-- Varray version
SELECT *
FROM t
WHERE t2 IS NOT NULL
AND EXISTS (
  SELECT 1 FROM TABLE(t2) WHERE column_value = 'abc'
);

Yielding:
ID  T1                        T2
-----------------------------------------------------
3   T1('abc', 'xyz', 'zzz')   T2('abc', 'xyz', 'zzz')

SUBMULTISET condition

Just like the previous MEMBER condition, this predicate can help check if specific values (more than one) are contained in a nested collection. This is a bit more tricky than the previous emulations. The MEMBER condition works the same way for sets and multisets, as we’re checking if exactly one element is contained in the (multi)set. When working with multisets, duplicates are allowed, and in the case of the SUBMULTISET operation, the following can be observed:

-- Equal multisets
t1() SUBMULTISET OF t1();
t1('a', 'a') SUBMULTISET OF t1('a', 'a');

-- Subsets
t1('a') SUBMULTISET OF t1('a', 'a');

-- But this is not true
t1('a', 'a') SUBMULTISET OF t1('a');

When we omit the fact that nested collections can be multisets and pretend we’re working with sets only, then the emulation of the SUBMULTISET operator is relatively easy:

-- Nested table version
SELECT * FROM t WHERE t1('abc', 'xyz') SUBMULTISET OF t1;

-- Varray version
SELECT *
FROM t
WHERE t2 IS NOT NULL
AND EXISTS (
  SELECT 1 FROM TABLE(t2) 
  WHERE column_value = 'abc'
  INTERSECT
  SELECT 1 FROM TABLE(t2) 
  WHERE column_value = 'xyz'
);

Yielding, once more:
ID  T1                        T2
-----------------------------------------------------
3   T1('abc', 'xyz', 'zzz')   T2('abc', 'xyz', 'zzz')
If we’re really working with multisets, things are a bit more tricky:

-- Nested table version
SELECT * FROM t WHERE t1('dup', 'dup') SUBMULTISET OF t1;

-- Varray version
SELECT *
FROM t
WHERE t2 IS NOT NULL
AND NOT EXISTS (
  SELECT column_value, count(*)
  FROM TABLE (t2('dup', 'dup')) x
  GROUP BY column_value
  HAVING count(*) > (
    SELECT count(*)
    FROM TABLE (t2) y
    WHERE y.column_value = x.column_value
  )
);

Yielding:
ID  T1                        T2
-----------------------------------------------------
4   T1('dup', 'dup', 'dup')   T2('dup', 'dup', 'dup')
How does it work? In the NOT EXISTS correlated subquery, we’re counting the number of duplicate values in the potential SUBMULTISET, effectively turning that SUBMULTISET into a SET using the GROUP BY operation. We’re then comparing that count value from the left operand with the corresponding count value from the right operand. If there is no value in the left operand whose number of occurrences is bigger than the number of occurrences of that value in the right operand, then the whole left operand is a SUBMULTISET of the right operand. Cool, eh? We’ll talk about performance another time :-)

MULTISET operators

Also very interesting, the multiset operators:
  • MULTISET EXCEPT [ ALL | DISTINCT ]
  • MULTISET INTERSECT [ ALL | DISTINCT ]
  • MULTISET UNION [ ALL | DISTINCT ]
Notice how there are some differences to the ordinary set operators that can be used in SELECT statements. In particular:
  • EXCEPT is used as defined in the standard, not MINUS
  • ALL is supported on all three operators, not just on UNION
  • ALL is the default, not DISTINCT
How can we work with these operators? Consider these queries:

SELECT id, t1 MULTISET EXCEPT t1('aaa', 'abc', 'dup', 'dup') r 
FROM t;

SELECT id, t1 MULTISET EXCEPT ALL t1('aaa', 'abc', 'dup', 'dup') r 
FROM t;

Both yielding:
ID   R
---------------------
1    (null)
2    T1()
3    T1('xyz', 'zzz')
4    T1('dup')
With this operator, we’re removing each element of the right operand once from the left operand:
  • 'aaa' does not appear in the left operand, so nothing happens
  • 'abc' appears on row with ID = 3 and we remove it
  • 'dup' appears on row with ID = 4, 3 times, and we remove it twice, leaving one value
Conversely, when adding DISTINCT, we’ll get:

SELECT t1 MULTISET EXCEPT DISTINCT t1('aaa', 'abc', 'dup') FROM t;

Yielding:
ID   R
---------------------
1    (null)
2    T1()
3    T1('xyz', 'zzz')
4    T1('')
The only difference is on row with ID = 4, where all 'dup' values were removed, regardless how many there were on either side of the MULTISET EXCEPT DISTINCT operator. How to emulate this for varrays? DISTINCT version This is a bit easier, because we can now use MINUS:

-- Nested table version
SELECT t1 MULTISET EXCEPT DISTINCT t1('aaa', 'abc', 'dup', 'dup') 
FROM t;

-- Varray version
SELECT 
  id,
  CASE 
    WHEN t2 IS NULL THEN NULL 
    ELSE 
      CAST(MULTISET(
        SELECT column_value
        FROM TABLE (t2)
        MINUS
        SELECT column_value
        FROM TABLE (t2('aaa', 'abc', 'dup', 'dup'))
      ) AS t2)
  END r
FROM t;

Luckily, we can still cast a structural MULTISET type that we can obtain using the MULTISET() operator to a varray type. This greatly simplifies the task. ALL version If we want the MULTISET EXCEPT or MULTISET EXCEPT ALL semantics, things are trickier. Here’s a solution that resorts to using window functions, in order to turn a MULTISET back into a SET:

-- Nested table version
SELECT t1 MULTISET EXCEPT ALL t1('aaa', 'abc', 'dup', 'dup') 
FROM t;

-- Varray version
SELECT 
  id,
  CASE 
    WHEN t2 IS NULL THEN NULL 
    ELSE 
      CAST(MULTISET(
        SELECT column_value
        FROM (
          SELECT 
            column_value,
            row_number() OVER (
              PARTITION BY column_value 
              ORDER BY column_value) rn
          FROM TABLE (t2)
          MINUS
          SELECT 
            column_value, 
            row_number() OVER (
              PARTITION BY column_value 
              ORDER BY column_value) rn
          FROM TABLE (t2('aaa', 'abc', 'dup', 'dup'))
        )
      ) AS t2)
  END r
FROM t;

How does this work? Ideally, we’ll look at what this ROW_NUMBER() evaluates to on each row. For this, we use OUTER APPLY:

SELECT id, t2, column_value, rn
FROM t
OUTER APPLY (
  SELECT 
    column_value,
    row_number() OVER (
      PARTITION BY column_value
      ORDER BY column_value) rn
  FROM TABLE (t2)
);

The result is:
ID      T2                       COLUMN_VALUE  RN
-----------------------------------------------------
1       (null)                   (null)        (null)
2       T2()                     (null)        (null)
3       T2('abc', 'xyz', 'zzz')  abc           1
3       T2('abc', 'xyz', 'zzz')  xyz           1
3       T2('abc', 'xyz', 'zzz')  zzz           1
4       T2('dup', 'dup', 'dup')  dup           1
4       T2('dup', 'dup', 'dup')  dup           2
4       T2('dup', 'dup', 'dup')  dup           3
As can be seen, each duplicate value gets assigned a unique row number due to the nature of how ROW_NUMBER() works (this property can be very useful for solving the gaps-and-islands-problem. See trick #4). Now that we turned our (COLUMN_VALUE) multiset into a (COLUMN_VALUE, RN) set (without duplicates), we can use MINUS again.

MULTISET INTERSECT and MULTISET UNION

MULTISET INTERSECT works exactly the same way as MULTISET EXCEPT, with the same window function based emulation in the MULTISET INTERSECT ALL case. MULTISET UNION is simpler, because Oracle knows UNION ALL, so we do not need to resort to such trickery.

Conclusion

Nested collections are a very powerful tool in Oracle SQL. Oracle knows two types of nested collections:
  • Nested tables
  • Varrays
Nested tables are trickier to maintain as you have to think of their storage more explicitly. Varrays can just be embedded into ordinary tables like any other column. But there’s a price to pay for using varrays. Oracle regrettably doesn’t support all of the above very useful multiset conditions and multiset operators. Luckily, when you encounter a situation where you have varrays and cannot change that, you can still emulate each of the operators using more traditional SQL.

5 Things You May Not Have Known About jOOQ

jOOQ has been around for a while now (since 2009!) and by now we can say we’ve seen quite a bit of things about the SQL and Java languages. Some of our design decisions are particular in the way jOOQ thinks about programming with SQL. These include:
  • Nullability (let’s stop fighting it)
  • Value types (let’s stop pretending SQL has identities)
  • Everything is a table (this really helps get the most out of SQL)
  • Queries are side-effect free functions
jOOQ incorporates all of these ideas. Here are 5 Things You May Not Have Known About jOOQ:

1. Every Column Type is Nullable

SQL NULL is a subtly different beast from Java null, even if programmers often use it for the same thing: Something that is “uninitialised”, some value that we don’t “care about” (yet), or some value that we “don’t need”. A good example would be a middle name:

CREATE TABLE person (
  first_name  VARCHAR(50) NOT NULL,
  middle_name VARCHAR(50),
  last_name   VARCHAR(50) NOT NULL,
  ..
);

Of course, a sufficiently pessimistic programmer will immediately see tons of flaws with the above design. Go read this article about falsehoods programmers believe about names for details. But anyway, the important thing to understand about NOT NULL constraints in SQL is the fact that they’re… constaints. Just like UNIQUE constraints, FOREIGN KEY constraints, or CHECK constraints. In fact, they are CHECK constraints. We could rewrite the above table as such:

CREATE TABLE person (
  first_name  VARCHAR(50) CHECK (first_name IS NOT NULL),
  middle_name VARCHAR(50),
  last_name   VARCHAR(50) CHECK (last_name IS NOT NULL),
  ..
);

… and the table would be semantically equivalent. This constraint is just so common that it has a special syntax for it (which is also sometimes better optimised than the equivalent check constraint).
Sidenote: An even more sophisticated constraint type is the SQL standard assertion, which unfortunately hasn’t been implemented in any database I’m aware of yet. There are discussions of adding it to a future Oracle version, though. Assertions are like CHECK constraints, but they work on the entire table / schema / whatever scope. For instance, we could assert that every department of a company must have at least one manager. Currently, we can do this sort of thing only through triggers.
The important message here is that a constraint is a validation on the entire data set (or on a subset, down to an individual row). It is not a type modifier, because even if the NOT NULL constraint may have direct optimisation implications on the column type it is attached to, it is a separate construct that can even be deferred. While languages don’t have to be this way (e.g. Ceylon models constraints directly on types), SQL works like this. Two examples:
  1. DEFAULT columns: When you have an identity column or some sort of GENERATED BY DEFAULT AS... clause on your column, then the value in the column may be generated by default (duh), which may include – depending on the RDBMS vendor – the generation of a value when it is NULL.
  2. DEFERRED constraints: Some databases (e.g. PostgreSQL) support deferred constraints, i.e. constraints that are validated only when the transaction is committed. This can be specified on individual constraints, or on the session. Which means that the value NULL is a totally acceptable value for a NOT NULL column for a certain amount of time.
Both of the above imply that we must not take NOT NULL as a type modifier, the way some languages have started doing it, like Ceylon or Kotlin:

val a: String? = null;
val b: String = a; // Error

In such languages, String? and String are distinct types, specifically in Ceylon where String? is just syntax sugar for the union type String|Null. But not in SQL. If a Java API wants to properly reflect the SQL language the way jOOQ does, then all types must be nullable. It is a mistake to:
  • Use primitive types
  • Use Option(al) (there are other caveats with these related to generic type erasure)
  • Use non-null types in languages that have them
  • Use validation annotations (we made that mistake, unfortunately)
  • Use JSR-305 or JSR-308 annotations
Sidenote: If this constraint information should be annotated in Java classes, then JPA @Column(nullable=true) annotations are acceptable, because they simply map to the constraint without any implications on the type. The implications are applied on the persistence behaviour, which is reasonable.
Besides, even if at first, encoding nullability through e.g. Option(al) seems reasonable, it breaks as soon as you outer join anything, e.g.:

SELECT p.*
FROM dual
LEFT JOIN person p
ON p.first_name = 'Ooops, no one by that name'

The above query will produce a single person record with only NULL values in its columns. DESPITE the NOT NULL constraints. Ooops. We’ll get null in non-optional types. Similar things can happen with unions, grouping sets, functions, and a few other operations. Takeaway In SQL, all types are always nullable. We simply have to deal with this. Every clever type safety is contrary to SQL logic. If your API does this, you may get some minor convenience in 80% of the use-cases for the price of a major annoyance in 20% of the use-cases. That’s not a reasonable tradeoff given that in Java, every non-primitive type is nullable as well, so we got a perfect and intuitive match.

2. SQL is a Set-Based, Values-Only Language

Values or Objects? That’s a tricky question for people who work with Java, a language that claims to be mainly object-oriented. Java has value support as well. There are 8 different value types as of Java 8:
  • byte
  • short
  • int
  • long
  • float
  • double
  • boolean
  • char
Values have a couple of nice properties:
  • They are immutable. It may be possible to mutate a variable holding such a value, but we cannot mutate the value itself. 42 will always stay 42
  • Two values that are equal are undistinguishable. 42 == 42 really means that they’re the exact same thing. Reusing == for value equality and identity equality has been a bit of an unfortunate choice in Java, because technically, a String is also a value, yet we cannot compare it with == because there’s a possibility of two identical strings having different identity. (True) values don’t have identity.
Java 8 introduced the notion of a “ValueBased” class, which is really a weird thing, because a “ValueBased” wrapper like Optional can reference a non-value based type, say, a java.sql.Connection. Not a good idea, but certainly possible. A future Java might have more complex value types, for instance:

// Hypothetical syntax
value Point(int x, int y) {}
value Box(Point a, Point b) {
  int area() {
    return Math.abs(a.x - b.x * a.y - b.y);
  }
}

This will certainly be helpful (as soon as we’ll figure out how to model nullability in such scenarios). In SQL, all records are values. They do not have a true identity (although most databases choose to provide implementation specific identities like ROWIDs). Do not confuse primary keys with identity descriptors. A primary key is a special value that is guaranteed to be unique within a table. It happens to be used as a logical identity (at least when using surrogate keys). But as NOT NULL constraints, PRIMARY KEY constraints are constraints, and they’re deferrable in some databases. And there are many ways how we can produce results where primary keys are no longer meaningful, e.g. this:

SELECT * FROM person
UNION ALL
SELECT * FROM person

SQL, unlike relational algebra, doesn’t operate on sets but on bags (or multisets), i.e. data structures that allow for duplicate values. Multisets make analytics much more powerful, while making OLTP quite harder. As always, with useful things, they come at a price. jOOQ, by consequence, also works in the value-oriented multi set paradigm. This is completely contrary to what Hibernate / JPA does, as Hibernate emulates entity identity through the primary key, which it has to do, being an object-graph persistence API. It doesn’t have to do this because of working with sets rather than multisets, although having identities does make things easier in that paradigm. If you want to read an interesting and entertaining discussion on the subject, check out these tweets between Gavin King and myself:
The importance here is to understand: Neither approach is absolutely better. Both have their advantages. If a RDBMS vendor had implemented a database following a set-based approach instead of SQL’s multiset-based approach, a lot of persistence problems would have been much easier to implement on that RDBMS. On the other hand, a lot of reporting and analytics would have been harder, because with sets being sets, we’d have to constantly prevent “duplicates” from being removed early by keeping primary keys around in queries until the final aggregation. Now even if we could re-start this interesting discussion, fact is, that we have SQL and it is multiset-based. The default is SELECT "ALL", not SELECT DISTINCT (the ALL keyword being specified in the standard, but not available in most implementations). When using jOOQ, a value-based record-centric programming approach is recommended, where result sets from jOOQ queries are really “just” streams of records, which will be further transformed without ever thinking about persisting any elements from those streams again. Sure there can be write operations as well, but a jOOQ (or SQL) write operation is also a multiset-based streaming of records (values) back into the database. That’s important to know, because all of
  • INSERT
  • UPDATE
  • DELETE
  • MERGE
statements are multiset-based, i.e. they take a set of values, not just a single row. For instance, INSERT:

-- Not all databases support this standard syntax:
INSERT INTO t (a, b, c)
VALUES (1, 2, 3),
       (4, 5, 6),
       (7, 8, 9);

-- But all databases support this one:
INSERT INTO t1 (a, b, c)
SELECT a, b, c
FROM t2;

Notice how this has absolutely nothing to do with identity-based object-graph persistence. In SQL, we’re always streaming a set of values from one place to another, possibly targeting a table where we store that set. The approach is really beautiful, try to think this way and it’ll open up a whole new world to the SQL-oriented programmer. In a future article, I’ll even go a step further and claim that SQL is an (almost) completely side-effect free language (and this includes statements like INSERT – stay tuned). Takeaway In SQL, everything is a value. There is no identity. It is not needed, because SQL is a multiset-based language, where we’re always operating on the entire data set, not on individual records, even if CRUD operations may make you think otherwise. jOOQ encourages this way of thinking by putting the table and the “value-based” record into the center of the programming model.

3. ResultQuery is an Iterable

I’ve blogged about this before, and some users may have discovered it by accident, intrigued. A jOOQ ResultQuery is an Iterable, meaning that you can “foreach it”:

ResultQuery<?> query =
DSL.using(configuration)
   .select(PERSON.FIRST_NAME, PERSON.LAST_NAME)
   .from(PERSON);

// Java 5 style
for (Record record : query)
  System.out.println(record);

// Java 8 style
query.forEach(System.out::println);

It makes a lot of sense. A SQL query is a description of a set of tuples. SQL is a functional programming language, and if you forget about some concurrency aspects, it is, in principle, side-effect free. This means that the query really IS the set of tuples (another nice way to think about SQL!). With that thought in mind, we can simply iterate it. To the procedural mind of many Java developers, this might be a bit funky and surprising, but give this a little thought and it might “click”. Consider also this previous article, claiming that streams, for comprehensions, and SQL are all the same:
Or also this fun tweet:
Takeaway We’re not there yet in Java, we still explicitly iterate, but when we do, and the data source is a SQL query, make it a jOOQ query because that helps you forget about the difference between the query and the data, which are really the same thing in SQL.

4. Ordering is Nice When It’s Cheap. Let’s Retain It

You should avoid ORDER BY in SQL if you don’t really need it. Why? Because unless you can profit from an index that has already pre-ordered your result sets, sorting is a super expensive operation in all programming languages, including SQL. It’s essentially O(n log n). But let’s assume you do have to sort your results, well, we better want to make sure that this ordering stays the same for as long as possible. By default, jOOQ returns a Result type, or List types, but there are many utility methods like the ResultQuery.fetchMap() method, which can return something like this:

Map<Integer, String> people =
DSL.using(configuration)
   .select(PERSON.ID, PERSON.FIRST_NAME)
   .from(PERSON)
   .orderBy(PERSON.ID)
   .fetchMap(PERSON.ID, PERSON.FIRST_NAME);

Internally, jOOQ collects all data into a LinkedHashMap, which is a slightly more resource intensive map than the similar HashMap. In case you haven’t used this very often, it’s a map that preserves the insertion order when iterating the map using Map.entrySet() and all the other methods. Quite useful when displaying the map, too. After all, if you do specify the ordering, then you wanted that order to appear in the results, right? In a similar way, when using Collections.sort() in Java, the sort algorithm guarantees that sorting is stable. If you sort a list twice, then the original ordering will be retained for elements that are not re-ordered. I.e. when sorting by first name, and then by last name, the first name ordering will be retained for equal last names. Takeaway ORDER BY is expensive, so if you go through the trouble of actually doing it, you want to retain that order.

5. Dynamic SQL is the Default

In the old days, people mostly wrote static SQL, e.g. using stored procedures in languages like PL/SQL. When you write an implicit cursor loop in PL/SQL:

FOR rec IN (SELECT * FROM person)
LOOP
  dbms_output.put_line(rec.first_name || ' ' || rec.last_name);
END LOOP;

… then, this SQL statement is compiled along with the surrounding procedural code and it will never be changed again. That’s useful for batch processing, reporting, etc. (Strictly speaking it isn’t really “static”, because the SQL statement will still be parsed by the SQL engine like any other query, but the PL/SQL programming model allows for hiding this from you). In modern days, we require dynamic SQL very often, because the SQL code is often generated from user input. Mostly, because:
  • Users can add predicates through the UI
  • Users can specify aggregations through the UI
  • Users can specify ordering through the UI
In some more remote use-cases, users might also influence the JOIN tree and other parts of a dynamically created query. From a JDBC perspective, all queries are dynamic, even if you’re doing something like this:

try (ResultSet rs = stmt.executeQuery(
  "SELECT * FROM person"
)) {
  while (rs.next())
    out.println(rs.getString(1) + " " + rs.getString(2));
}

Clearly, the SQL string seems “static” in the way that the Java compiler will compile it once and then never touch it again. The above program will always send the exact same SQL string to the server. Yet from a JDBC API perspective, the string is just an argument to the executeQuery() method, just as if we wrote it like this:

try (ResultSet rs = stmt.executeQuery(
  "SELECT * FROM person" + 
  (active ? " WHERE active = 1" : "")
)) {
  while (rs.next())
    out.println(rs.getString(1) + " " + rs.getString(2));
}

Yuck! String concatenation to build SQL strings. There’s a substantial risk of: Of course, the above example is SQL injection “safe”, because the SQL string is entirely constructed from constants, not user input. But how quickly could the code be refactored to this?

try (ResultSet rs = stmt.executeQuery(
  "SELECT * FROM person" + 
  (active ? (" WHERE active = " + active) : "")
)) {
  while (rs.next())
    out.println(rs.getString(1) + " " + rs.getString(2));
}

SQL builders like jOOQ help prevent SQL injection, even for dynamic SQL queries. The above query will be written as follows in jOOQ:

for (PersonRecord rec : DSL.using(configuration)
        .selectFrom(person)
		.where(active
		    ? PERSON.ACTIVE.eq(active)
			: trueCondition()))
  out.println(rec.getFirstName() + " " + rec.getLastName());

The active flag check that is added to the SQL query dynamically will default to creating a bind variable, and even if it is inlined, it will be escaped, depending on its type. The interesting bit here, however, is that the jOOQ query is always a dynamic SQL query. The above approach used an inline expression to decide whether a certain predicate needs to be added to the statement. If that predicate gets more complex, we can extract the construction of the predicate to a local variable, or a function. Local variable

Condition condition = trueCondition();

if (active)
  condition = PERSON.ACTIVE.eq(active);
	
if (searchForFirstName)
  condition = condition.and(PERSON.FIRST_NAME.like(pattern));

for (PersonRecord rec : DSL.using(configuration)
        .selectFrom(person)
		.where(condition))
  out.println(rec.getFirstName() + " " + rec.getLastName());

This is quite neat. Functions Or, if things get even more complex, we might like to factor out the logic to a method, or a function. Some people have started calling such an approach “functional relational mapping”:

Condition personCondition(boolean active, String pattern) {
  Condition condition = trueCondition();

  if (active)
    condition = PERSON.ACTIVE.eq(active);
	
  if (pattern != null)
    condition = condition.and(PERSON.FIRST_NAME.like(pattern));
	
  return condition;
}

// And then:
for (PersonRecord rec : DSL.using(configuration)
        .selectFrom(person)
		.where(personCondition(active, pattern)))
  out.println(rec.getFirstName() + " " + rec.getLastName());

Or even:

BiFunction<Boolean, String, Condition> personCondition() {
  return (active, pattern) -> {
    Condition condition = trueCondition();

    if (active)
      condition = PERSON.ACTIVE.eq(active);
	
    if (pattern != null)
      condition = condition.and(PERSON.FIRST_NAME.like(pattern));
	
    return condition;
  };
}

// And then:
for (PersonRecord rec : DSL.using(configuration)
        .selectFrom(person)
		.where(personCondition.apply(active, pattern)))
  out.println(rec.getFirstName() + " " + rec.getLastName());

Not only is this approach to writing dynamic SQL extremely useful for client code that relies on dynamic SQL, the expression tree that is built behind the scenes is also available at runtime for more complex transformations, such as applying row level security to certain queries, or more simply to apply something like schema-based multi-tenancy. While the Java code stays exactly the same, the generated SQL string may be transformed by your own library code, behind the scenes. Static SQL Of course, jOOQ doesn’t imply that you have to write dynamic SQL. You can store jOOQ-generated SQL strings in caches, or you can use stored procedures with jOOQ. In fact, jOOQ encourages you to use stored procedures! Takeaway Dynamic SQL is really useful. jOOQ defaults to writing dynamic SQL in a way that you don’t even notice. A SQL query is a function just as much as it is a collection description. jOOQ helps you think about SQL this way.

Conclusion

SQL is a beautiful language with an interesting syntax. If we look at the concepts that are the foundation of the SQL language, we see that SQL queries are functional / declarative collection descriptions. With this paradigm in mind, we can write really powerful SQL statements, and jOOQ encourages this as this paradigm is at the core of the jOOQ API design. Enjoy writing functional-relational mapping code.