Vendor Agnostic, Dynamic Procedural Logic with jOOQ

One of the strengths of modern RDBMS is the capability to mix the powerful SQL language with procedural code.

SQL is a 4th generation programming language (4GL), and as such, extremely well suited for querying and bulk data manipulation. Its functional-declarative nature allows for it to be optimised in highly efficient ways using cost-based optimisation, but also statically as we’ve blogged before.

Sometimes, however, an imperative 3GL is better suited for a given task. That’s where stored procedures shine, or more specifically, procedural languages of RDBMS.

Among the ones that jOOQ supports, at least these ones support procedures:

  • BigQuery
  • Db2
  • Exasol
  • Firebird
  • HANA
  • HSQLDB
  • Informix
  • MariaDB
  • MySQL
  • Oracle
  • PostgreSQL
  • SQL Server
  • Vertica

Others may do, as well, but jOOQ isn’t supporting their dialects yet.

Many have implemented their own procedural languages, some according to the ISO/IEC 9075-4 Persistent stored modules (SQL/PSM) standard, others have their own.

jOOQ support for procedural logic

Since jOOQ 3.12, our commercial distributions have supported anonymous blocks and the procedural statements they contain, such as the IF statement, LOOP statements, etc. Starting with jOOQ 3.15, we also support 3 types of statements to manage storing procedural logic in the catalog:

Using these statements via jOOQ may not be your every day use-case. You may prefer managing that logic via the native syntax, which is still more powerful than what jOOQ 3.15 supports (especially when you’re using Oracle’s PL/SQL), in case of which you’ll u se jOOQ purely to call your procedure from Java in the usual type safe manner.

But maybe, you have one of these use-cases?

  • You’re a product vendor, and you profit from procedural logic being vendor agnostic in order to support multiple of your clients’ RDBMS
  • Your procedural logic is dynamic, just like your SQL logic (and what other than jOOQ to use for that?)
  • You don’t have the necessary privileges to create procedures, functions, or triggers in your schema

In all of those cases, jOOQ is here to help you.

How does it work?

The first building block is the anonymous block, which isn’t supported by all of the above dialects, regrettably. jOOQ can emulate it on MySQL as discussed here, but not currently in other dialects.

Here’s a simple, empty anonymous block:

-- Db2
BEGIN
END

-- Firebird
EXECUTE BLOCK AS
BEGIN
END

-- MariaDB
BEGIN NOT ATOMIC
END;

-- Oracle
BEGIN
  NULL;
END;

-- PostgreSQL
DO $$
BEGIN
  NULL;
END;
$$

It doesn’t really do much, but you can try executing it as follows, with jOOQ:

ctx.begin().execute();

Now, let’s do something more interesting, such as:

// Assuming the usual static imports:
import static org.jooq.impl.DSL.*;
import static org.jooq.impl.SQLDataType.*;

// Then write
Variable<Integer> i = variable(unquotedName("i"), INTEGER);
Table<?> t = table(unquotedName("t"));
Field<Integer> col = field(unquotedName("col"), INTEGER);

ctx.begin(
    declare(i).set(1),

    while_(i.le(10)).loop(
        insertInto(t).columns(c).values(i),
        i.set(i.plus(1))
    )
).execute();

The above block executes:

-- Db2
BEGIN
  DECLARE i integer;
  SET i = 1;
  WHILE i <= 10 DO
    INSERT INTO t (c)
    VALUES (i);
    SET i = (i + 1);
  END WHILE;
END

-- FIREBIRD
EXECUTE BLOCK AS
  DECLARE i integer;
BEGIN
  :i = 1;
  WHILE (:i <= 10) DO BEGIN
    INSERT INTO t (c)
    VALUES (:i);
    :i = (:i + 1);
  END
END

-- MariaDB
BEGIN NOT ATOMIC
  DECLARE i int;
  SET i = 1;
  WHILE i <= 10 DO
    INSERT INTO t (c)
    VALUES (i);
    SET i = (i + 1);
  END WHILE;
END;

-- Oracle
DECLARE
  i number(10);
BEGIN
  i := 1;
  WHILE i <= 10 LOOP
    INSERT INTO t (c)
    VALUES (i);
    i := (i + 1);
  END LOOP;
END;

-- PostgreSQL
DO $$
DECLARE
  i int;
BEGIN
  i := 1;
  WHILE i <= 10 LOOP
    INSERT INTO t (c)
    VALUES (i);
    i := (i + 1);
  END LOOP;
END;
$$

-- SQL Server
BEGIN
  DECLARE @i int;
  SET @i = 1;
  WHILE @i <= 10 BEGIN
    INSERT INTO t (c)
    VALUES (@i);
    SET @i = (@i + 1);
  END;
END;

Easy as pie. Perhaps you prefer a FOR loop, instead? Try this:

ctx.begin(
    for_(i).in(1, 10).loop(
        insertInto(t).columns(c).values(i)
    )
).execute();

It produces the necessary emulations, if required, because regrettably, not all dialects support FOR:

-- Db2
BEGIN
  DECLARE i integer;
  SET i = 1;
  WHILE i <= 10 DO
    INSERT INTO t (c)
    VALUES (i);
    SET i = (i + 1);
  END WHILE;
END

-- Firebird
EXECUTE BLOCK AS
  DECLARE i integer;
BEGIN
  :i = 1;
  WHILE (:i <= 10) DO BEGIN
    INSERT INTO t (c)
    VALUES (:i);
    :i = (:i + 1);
  END
END

-- MariaDB
BEGIN NOT ATOMIC
  FOR i IN 1 .. 10 DO
    INSERT INTO t (c)
    VALUES (i);
  END FOR;
END;

-- Oracle
BEGIN
  FOR i IN 1 .. 10 LOOP
    INSERT INTO t (c)
    VALUES (i);
  END LOOP;
END;

-- PostgreSQL
DO $$
BEGIN
  FOR i IN 1 .. 10 LOOP
    INSERT INTO t (c)
    VALUES (i);
  END LOOP;
END;
$$

-- SQL Server
BEGIN
  DECLARE @i int;
  BEGIN
    SET @i = 1;
    WHILE @i <= 10 BEGIN
      INSERT INTO t (c)
      VALUES (@i);
      SET @i = (@i + 1);
    END;
  END;
END;

SQL vs procedures

Of course, this particular SQL statement would be better implemented using a single bulk insertion statement, purely with SQL, not with procedural logic

ctx.insertInto(t, c)
   .select(selectFrom(generateSeries(1, 10)))
   .execute();

Which translates to:

-- Db2
INSERT INTO t (c)
SELECT generate_series.generate_series
FROM (
  WITH
    generate_series(generate_series) AS (
      SELECT 1
      FROM SYSIBM.DUAL
      UNION ALL
      SELECT (generate_series + 1)
      FROM generate_series
      WHERE generate_series < 10
    )
  SELECT generate_series
  FROM generate_series
) generate_series;

-- Firebird
INSERT INTO t (c)
SELECT generate_series.generate_series
FROM (
  WITH RECURSIVE
    generate_series(generate_series) AS (
      SELECT 1
      FROM RDB$DATABASE
      UNION ALL
      SELECT (generate_series + 1)
      FROM generate_series
      WHERE generate_series < 10
    )
  SELECT generate_series
  FROM generate_series
) generate_series;

-- MariaDB
INSERT INTO t (c)
SELECT generate_series.generate_series
FROM (
  WITH RECURSIVE
    generate_series(generate_series) AS (
      SELECT 1
      UNION ALL
      SELECT (generate_series + 1)
      FROM generate_series
      WHERE generate_series < 10
    )
  SELECT generate_series
  FROM generate_series
) AS generate_series;

-- Oracle
INSERT INTO t (c)
SELECT generate_series.generate_series
FROM (
  SELECT (level + (1 - 1)) generate_series
  FROM DUAL
  CONNECT BY level <= ((10 + 1) - 1)
) generate_series;

-- PostgreSQL
INSERT INTO t (c)
SELECT generate_series.generate_series
FROM generate_series(1, 10);

-- SQL Server
WITH
  generate_series(generate_series) AS (
    SELECT 1
    UNION ALL
    SELECT (generate_series + 1)
    FROM generate_series
    WHERE generate_series < 10
  )
INSERT INTO t (c)
SELECT generate_series.generate_series
FROM (
  SELECT generate_series
  FROM generate_series
) generate_series

… but you get the point.

Storing the procedural logic

If you have the necessary privileges, and your procedural logic isn’t super dynamic, you may choose to store your logic in a procedure or function directly in your database. In some databases, this means a compiler will be able to eagerly translate the logic to something very efficient (e.g. machine code), instead of interpreting the logic on the fly.

Take the above WHILE loop, for example. You may want to store that as a procedure P:

Name p = unquotedName("p");

ctx.createProcedure(p)
   .modifiesSQLData()
   .as(
        declare(i).set(1),

        while_(i.le(10)).loop(
            insertInto(t).columns(c).values(i),
            i.set(i.plus(1))
        )
   )
   .execute();

This produces the following statements:

-- Db2
CREATE PROCEDURE p()
MODIFIES SQL DATA
BEGIN
  DECLARE i integer;
  SET i = 1;
  WHILE i <= 10 DO
    INSERT INTO t (c)
    VALUES (i);
    SET i = (i + 1);
  END WHILE;
END;

-- Firebird
CREATE PROCEDURE p()
AS
  DECLARE i integer;
BEGIN
  :i = 1;
  WHILE (:i <= 10) DO BEGIN
    INSERT INTO t (c)
    VALUES (:i);
    :i = (:i + 1);
  END
END

-- MariaDB
CREATE PROCEDURE p()
MODIFIES SQL DATA
BEGIN
  DECLARE i int;
  SET i = 1;
  WHILE i <= 10 DO
    INSERT INTO t (c)
    VALUES (i);
    SET i = (i + 1);
  END WHILE;
END;

-- Oracle
CREATE PROCEDURE p
AS
  i number(10);
BEGIN
  i := 1;
  WHILE i <= 10 LOOP
    INSERT INTO t (c)
    VALUES (i);
    i := (i + 1);
  END LOOP;
END;

-- PostgreSQL
CREATE PROCEDURE p()
LANGUAGE plpgsql
AS
$$
DECLARE
  i int;
BEGIN
  i := 1;
  WHILE i <= 10 LOOP
    INSERT INTO t (c)
    VALUES (i);
    i := (i + 1);
  END LOOP;
END;
$$

-- SQL Server
CREATE PROCEDURE p
AS
BEGIN
  DECLARE @i int;
  SET @i = 1;
  WHILE @i <= 10 BEGIN
    INSERT INTO t (c)
    VALUES (@i);
    SET @i = (@i + 1);
  END;
END;

And now, what better way to call this procedure than, again, an anonymous block?

ctx.begin(call(unquotedName("p"))).execute();

Producing:

-- Db2
BEGIN
  CALL p();
END

-- Firebird
EXECUTE BLOCK AS
BEGIN
  EXECUTE PROCEDURE p;
END

-- MariaDB
BEGIN NOT ATOMIC
  CALL p();
END;

-- Oracle
BEGIN
  p();
END;

-- PostgreSQL
DO $$
BEGIN
  CALL p();
END;
$$

-- SQL Server
BEGIN
  EXEC p ;
END;

If you’re using jOOQ in Flyway or Liquibase to generate procedures during your database migrations, you can obviously generate jOOQ procedure stubs to call in a more type safe manner, instead of the above dynamic procedure call.

Parsing procedural logic

This jOOQ feature is not really exceptional. You can play around with our parser / translator here: https://www.jooq.org/translate. It can definitely help you translate your (simpler) stored procedures between dialects, such as PL/SQL, T-SQL, PL/pgSQL, etc.

Conclusion

As a rule of thumb, if you can do it with SQL (the 4GL), do it with SQL alone. But sometimes, you can’t. A 3GL is a better choice for an algorithm. When using jOOQ, you’ll naturally think of using Java to implement that 3GL algorithm. But wait, you could move the logic to the server for (drastically) increased performance!

Thanks to jOOQ, you can generate procedural logic that is:

  • Dynamic
  • Vendor agnostic
  • Anonymous or stored

Just like you’re used to, from jOOQ, for SQL

jOOQ 3.15’s New Multiset Operator Will Change How You Think About SQL

This is how SQL should have been used all along.

They called it The Third Manifesto, ORDBMS, or other things. Regrettably, it never really took off. Because most vendors didn’t adopt it. And those who did, didn’t agree on syntax.

But this is about to change. Thanks to the now ubiquitous SQL/JSON support (which jOOQ 3.14 has already covered), we can now emulate the most powerful ORDBMS feature that you will want to use everywhere: Nested collections!

How We Used to do Things: With Joins

We’re going to be using the Sakila database for this example. It’s a DVD rental store with things like ACTOR, FILM, CATEGORY (of films) and other nice relational things. Let’s write a query for this requirement

Get me all the films with their actors and their categories

Classically, we’d go ahead and use jOOQ to write:

ctx.select(
      FILM.TITLE,
      ACTOR.FIRST_NAME,
      ACTOR.LAST_NAME,
      CATEGORY.NAME
   )
   .from(ACTOR)
   .join(FILM_ACTOR)
     .on(ACTOR.ACTOR_ID.eq(FILM_ACTOR.ACTOR_ID))
   .join(FILM)
     .on(FILM_ACTOR.FILM_ID.eq(FILM.FILM_ID))
   .join(FILM_CATEGORY)
     .on(FILM.FILM_ID.eq(FILM_CATEGORY.FILM_ID))
   .join(CATEGORY)
     .on(FILM_CATEGORY.CATEGORY_ID.eq(CATEGORY.CATEGORY_ID))
   .orderBy(1, 2, 3, 4)
   .fetch();

And the result? Not so nice. A denormalised, flat table containing tons of repetition:

+----------------+----------+---------+-----------+
|title           |first_name|last_name|name       |
+----------------+----------+---------+-----------+
|ACADEMY DINOSAUR|CHRISTIAN |GABLE    |Documentary|
|ACADEMY DINOSAUR|JOHNNY    |CAGE     |Documentary|
|ACADEMY DINOSAUR|LUCILLE   |TRACY    |Documentary|
|ACADEMY DINOSAUR|MARY      |KEITEL   |Documentary|
|ACADEMY DINOSAUR|MENA      |TEMPLE   |Documentary|
|ACADEMY DINOSAUR|OPRAH     |KILMER   |Documentary|
|ACADEMY DINOSAUR|PENELOPE  |GUINESS  |Documentary|
|ACADEMY DINOSAUR|ROCK      |DUKAKIS  |Documentary|
|ACADEMY DINOSAUR|SANDRA    |PECK     |Documentary|
|ACADEMY DINOSAUR|WARREN    |NOLTE    |Documentary|
|ACE GOLDFINGER  |BOB       |FAWCETT  |Horror     |
|ACE GOLDFINGER  |CHRIS     |DEPP     |Horror     |
|ACE GOLDFINGER  |MINNIE    |ZELLWEGER|Horror     |
 ...

If we don’t consume this result unmodified as it is (e.g. when displaying tabular data to a user), we’d then go and de-duplicate things, shoehorning them back again into some nested data structures (e.g. for consumption by some JSON based UI), and spending hours trying to untangle the cartesian products between the 2 nested collections FILM -> ACTOR and FILM -> CATEGORY (because ACTOR and CATEGORY now created a cartesian product, which we didn’t want!)

In the worst case, we don’t even notice! This example database only has 1 category per film, but it is designed to support multiple categories.

jOOQ can help with that deduplication, but just look at the number of questions for jOOQ many to many on Stack Overflow! You’ll probably still have to write at least 2 queries to separate the nested collections.

ENTER the Stage: Multiset

The standard SQL <multiset value constructor> operator allows for collecting the data from a correlated subquery into a nested data structure, a MULTISET. Everything in SQL is a MULTISET, so the operator isn’t too surprising. But the nesting is where it shines. The previous query can now be re-written in jOOQ as follows:

var result =
dsl.select(
      FILM.TITLE,
      multiset(
        select(
          FILM_ACTOR.actor().FIRST_NAME, 
          FILM_ACTOR.actor().LAST_NAME)
        .from(FILM_ACTOR)
        .where(FILM_ACTOR.FILM_ID.eq(FILM.FILM_ID))
      ).as("actors"),
      multiset(
        select(FILM_CATEGORY.category().NAME)
        .from(FILM_CATEGORY)
        .where(FILM_CATEGORY.FILM_ID.eq(FILM.FILM_ID))
      ).as("films")
   )
   .from(FILM)
   .orderBy(FILM.TITLE)
   .fetch();

Note, it’s not relevant to this task, but I’m using jOOQ’s type safe implicit to-one join feature, which further helps taming the joins, syntactically. A matter of taste.

How to read this query? Easy:

  • Get all the films
  • For each FILM, get all the actors as a nested collection
  • For each FILM, get all the categories as a nested collection

You’re going to like Java 10’s var keyword even more after this :) Because what type is result? It’s of this type:

Result<Record3<
  String,                          // FILM.TITLE
  Result<Record2<String, String>>, // ACTOR.FIRST_NAME, ACTOR.LAST_NAME
  Result<Record1<String>>          // CATEGORY.NAME
>>

That’s quite something. Not too complex if you think of it. There’s a result with 3 columns

  • TITLE
  • A first nested result that has 2 string columns: ACTOR.FIRST_NAME and ACTOR.LAST_NAME
  • A second nested result that has 1 string column: CATEGORY.NAME

Using var or other type inference mechanisms, you don’t have to denote this type. Or even better (stay tuned): We’ll type-safely map the structural type to our nominal DTO type hierarchy, with just a few additional lines of code. I’ll explain that later.

What Does the Result Look Like?

Calling toString() on the above Result type yields something like this:

+---------------------------+--------------------------------------------------+---------------+
|title                      |actors                                            |films          |
+---------------------------+--------------------------------------------------+---------------+
|ACADEMY DINOSAUR           |[(PENELOPE, GUINESS), (CHRISTIAN, GABLE), (LUCI...|[(Documentary)]|
|ACE GOLDFINGER             |[(BOB, FAWCETT), (MINNIE, ZELLWEGER), (SEAN, GU...|[(Horror)]     |
|ADAPTATION HOLES           |[(NICK, WAHLBERG), (BOB, FAWCETT), (CAMERON, ST...|[(Documentary)]|
|AFFAIR PREJUDICE           |[(JODIE, DEGENERES), (SCARLETT, DAMON), (KENNET...|[(Horror)]     |
|AFRICAN EGG                |[(GARY, PHOENIX), (DUSTIN, TAUTOU), (MATTHEW, L...|[(Family)]     |
|AGENT TRUMAN               |[(KIRSTEN, PALTROW), (SANDRA, KILMER), (JAYNE, ...|[(Foreign)]    |
 ...

Notice how we’re back to getting each FILM.TITLE entry only once (no duplication), and nested in each row are the subquery results. There’s no denormalisation happening!

When calling result.formatJSON() with the appropriate formatting options, we’ll get this representation:

[
  {
    "title": "ACADEMY DINOSAUR",
    "actors": [
      {
        "first_name": "PENELOPE",
        "last_name": "GUINESS"
      },
      {
        "first_name": "CHRISTIAN",
        "last_name": "GABLE"
      },
      {
        "first_name": "LUCILLE",
        "last_name": "TRACY"
      },
      {
        "first_name": "SANDRA",
        "last_name": "PECK"
      },

      ...
    ],
    "films": [
      { "name": "Documentary" }
    ]
  },
  {
    "title": "ACE GOLDFINGER",
    "actors": [
      {
        "first_name": "BOB",
        "last_name": "FAWCETT"
      },
  ...

Calling result.formatXML() would produce this:

<result>
  <record>
    <title>ACADEMY DINOSAUR</title>
    <actors>
      <result>
        <record>
          <first_name>PENELOPE</first_name>
          <last_name>GUINESS</last_name>
        </record>
        <record>
          <first_name>CHRISTIAN</first_name>
          <last_name>GABLE</last_name>
        </record>
        <record>
          <first_name>LUCILLE</first_name>
          <last_name>TRACY</last_name>
        </record>
        <record>
          <first_name>SANDRA</first_name>
          <last_name>PECK</last_name>
        </record>
        ...
      </result>
    </actors>
    <films>
      <result>
        <record>
          <name>Documentary</name>
        </record>
      </result>
    </films>
  </record>
  <record>
    <title>ACE GOLDFINGER</title>
    <actors>
      <result>
        <record>
          <first_name>BOB</first_name>
          <last_name>FAWCETT</last_name>
        </record>
        ...

You get the idea!

What’s the Generated SQL?

Just turn on jOOQ’s DEBUG logging and observe a query like this one (in PostgreSQL):

select
  film.title,
  (
    select coalesce(
      jsonb_agg(jsonb_build_object(
        'first_name', t.first_name,
        'last_name', t.last_name
      )),
      jsonb_build_array()
    )
    from (
      select 
        alias_78509018.first_name, 
        alias_78509018.last_name
      from (
        film_actor
          join actor as alias_78509018
            on film_actor.actor_id = alias_78509018.actor_id
        )
      where film_actor.film_id = film.film_id
    ) as t
  ) as actors,
  (
    select coalesce(
      jsonb_agg(jsonb_build_object('name', t.name)),
      jsonb_build_array()
    )
    from (
      select alias_130639425.name
      from (
        film_category
          join category as alias_130639425
            on film_category.category_id = alias_130639425.category_id
        )
      where film_category.film_id = film.film_id
    ) as t
  ) as films
from film
order by film.title

The Db2, MySQL, Oracle, SQL Server versions would look similar. Just try it on your Sakila database installation. It runs fast, too.

Mapping the Results to DTOs

Now, I promised to get rid of that lengthy structural type with all the generics. Check this out!

We used to call them POJOs (Plain Old Java Objects). Then DTOs (Data Transfer Objects). Now records. Yes, let’s try some Java 16 records here. (Note, records aren’t required for these examples. Any POJOs with appropriate constructors would do).

Wouldn’t it be nice, if result was of this type

record Actor(String firstName, String lastName) {}
record Film(
  String title,
  List<Actor> actors,
  List<String> categories
) {}

List<Film> result = ...

Structural typing is essential to jOOQ and its type safe query system, but in your code, you probably don’t want to have those merge-conflict lookalike generics all the time, and even var won’t help you if your data needs to be returned from a method.

So, let’s transform our jOOQ query to one that produces List<Film>, step by step. We’re starting with the original query, untouched:

Result<Record3<
    String, 
    Result<Record2<String, String>>, 
    Result<Record1<String>>
>> result =
dsl.select(
      FILM.TITLE,
      multiset(
        select(
          FILM_ACTOR.actor().FIRST_NAME, 
          FILM_ACTOR.actor().LAST_NAME)
        .from(FILM_ACTOR)
        .where(FILM_ACTOR.FILM_ID.eq(FILM.FILM_ID))
      ).as("actors"),
      multiset(
        select(FILM_CATEGORY.category().NAME)
        .from(FILM_CATEGORY)
        .where(FILM_CATEGORY.FILM_ID.eq(FILM.FILM_ID))
      ).as("films")
   )
   .from(FILM)
   .orderBy(FILM.TITLE)
   .fetch();

Now, we’re going to map the Actor in the first MULTISET expression. This can be done as follows, using the new Field.convertFrom() convenience method, which allows to turn a Field<T> into any read-only Field<U> for ad-hoc usage. A simple example would be this:

record Title(String title) {}

// Use this field in any query
Field<Title> title = FILM.TITLE.convertFrom(Title::new);

It’s just an easy, new way to attach a read-only Converter to a Field for single usage, instead of doing that with the code generator.

Applied to the original query:

Result<Record3<
    String, 
    List<Actor>,  // A bit nicer already
    Result<Record1<String>>
>> result =
dsl.select(
      FILM.TITLE,
      multiset(
        select(
          FILM_ACTOR.actor().FIRST_NAME, 
          FILM_ACTOR.actor().LAST_NAME)
        .from(FILM_ACTOR)
        .where(FILM_ACTOR.FILM_ID.eq(FILM.FILM_ID))

      // Magic here: vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
      ).as("actors").convertFrom(r -> r.map(mapping(Actor::new))),

      multiset(
        select(FILM_CATEGORY.category().NAME)
        .from(FILM_CATEGORY)
        .where(FILM_CATEGORY.FILM_ID.eq(FILM.FILM_ID))
      ).as("films")
   )
   .from(FILM)
   .orderBy(FILM.TITLE)
   .fetch();

What are we doing here?

  • The method convertFrom() takes a lambda Result<Record2<String, String>> -> Actor.
  • The Result type is the usual jOOQ Result, which has a Result.map(RecordMapper<R, E>) method.
  • The mapping() method is the new Records.mapping(), which isn’t doing much, just turning a constructor reference of type Function2<String, String, Actor> into a RecordMapper, which can then be used to turn a Result<Record2<String, String>> into a List<Actor>.

And it type checks! Try it yourself. If you add a column to the multiset, you’ll get a compilation error. If you add/remove an attribute from the Actor record, you’ll get a compilation error. No reflection here, just declarative mapping of jOOQ results/records to custom List<UserType>. If you prefer the “old” reflection approach using jOOQ’s ubiquitous into() methods, you can still do that, too:

Result<Record3<
    String, 
    List<Actor>,  // A bit nicer already
    Result<Record1<String>>
>> result =
dsl.select(
      FILM.TITLE,
      multiset(
        select(
          FILM_ACTOR.actor().FIRST_NAME, 
          FILM_ACTOR.actor().LAST_NAME)
        .from(FILM_ACTOR)
        .where(FILM_ACTOR.FILM_ID.eq(FILM.FILM_ID))

      // Magic here: vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
      ).as("actors").convertFrom(r -> r.into(Actor.class))),

      multiset(
        select(FILM_CATEGORY.category().NAME)
        .from(FILM_CATEGORY)
        .where(FILM_CATEGORY.FILM_ID.eq(FILM.FILM_ID))
      ).as("films")
   )
   .from(FILM)
   .orderBy(FILM.TITLE)
   .fetch();

The result still type checks, but the conversion from Result<Record2<String, String>> to List<Actor> no longer does, it uses reflection.

Let’s continue. Let’s remove the clumsy category Result<Record1<String>>. We could’ve added another record, but in this case, a List<String> will suffice.

Result<Record3<
    String, 
    List<Actor>,
    List<String>  // Begone, jOOQ structural type!
>> result =
dsl.select(
      FILM.TITLE,
      multiset(
        select(
          FILM_ACTOR.actor().FIRST_NAME, 
          FILM_ACTOR.actor().LAST_NAME)
        .from(FILM_ACTOR)
        .where(FILM_ACTOR.FILM_ID.eq(FILM.FILM_ID))
      ).as("actors").convertFrom(r -> r.map(mapping(Actor::new))),
      multiset(
        select(FILM_CATEGORY.category().NAME)
        .from(FILM_CATEGORY)
        .where(FILM_CATEGORY.FILM_ID.eq(FILM.FILM_ID))

      // Magic.     vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
      ).as("films").convertFrom(r -> r.map(Record1::value1))
   )
   .from(FILM)
   .orderBy(FILM.TITLE)
   .fetch();

And finally, the outer-most Result<Record3<...>> to List<Film> conversion

List<Film> result =
dsl.select(
      FILM.TITLE,
      multiset(
        select(
          FILM_ACTOR.actor().FIRST_NAME, 
          FILM_ACTOR.actor().LAST_NAME)
        .from(FILM_ACTOR)
        .where(FILM_ACTOR.FILM_ID.eq(FILM.FILM_ID))
      ).as("actors").convertFrom(r -> r.map(mapping(Actor::new))),
      multiset(
        select(FILM_CATEGORY.category().NAME)
        .from(FILM_CATEGORY)
        .where(FILM_CATEGORY.FILM_ID.eq(FILM.FILM_ID))
      ).as("films").convertFrom(r -> r.map(Record1::value1))
   )
   .from(FILM)
   .orderBy(FILM.TITLE)

   //     vvvvvvvvvvvvvvvvvv grande finale
   .fetch(mapping(Film::new));

This time, we don’t need the Field.convertFrom() method. Just using the Records.mapping() auxiliary will be sufficient.

A More Complex Example

The previous example showed nesting of two independent collections, which is quite hard with classic JOIN based SQL or ORMs. How about a much more complex example, where we nest things 2 levels, one of which being an aggregation, even? The requirement is:

Give me all the films, the actors that played in the film, the categories that categorise the film, the customers that have rented the film, and all the payments per customer for that film

I won’t even show a JOIN based approach. Let’s dive directly into MULTISET and the also new, synthetic MULTISET_AGG aggregate function. Here’s how to do this with jOOQ. Now, check out that beautiful result type:

Result<Record4<
    String,                   // FILM.TITLE
    Result<Record2<
        String,               // ACTOR.FIRST_NAME
        String                // ACTOR.LAST_NAME
    >>,                       // "actors"
    Result<Record1<String>>,  // CATEGORY.NAME
    Result<Record4<
        String,               // CUSTOMER.FIRST_NAME
        String,               // CUSTOMER.LAST_NAME
        Result<Record2<
            LocalDateTime,    // PAYMENT.PAYMENT_DATE
            BigDecimal        // PAYMENT.AMOUNT
        >>, 
        BigDecimal            // "total"
    >>                        // "customers"
>> result = 
dsl.select(

        // Get the films
        FILM.TITLE,

        // ... and all actors that played in the film
        multiset(
            select(
                FILM_ACTOR.actor().FIRST_NAME,
                FILM_ACTOR.actor().LAST_NAME
            )
            .from(FILM_ACTOR)
            .where(FILM_ACTOR.FILM_ID.eq(FILM.FILM_ID))
        ).as("actors"),

        // ... and all categories that categorise the film
        multiset(
            select(FILM_CATEGORY.category().NAME)
            .from(FILM_CATEGORY)
            .where(FILM_CATEGORY.FILM_ID.eq(FILM.FILM_ID))
        ).as("categories"),

        // ... and all customers who rented the film, as well
        // as their payments
        multiset(
            select(
                PAYMENT.rental().customer().FIRST_NAME,
                PAYMENT.rental().customer().LAST_NAME,
                multisetAgg(
                    PAYMENT.PAYMENT_DATE,
                    PAYMENT.AMOUNT
                ).as("payments"),
                sum(PAYMENT.AMOUNT).as("total"))
            .from(PAYMENT)
            .where(PAYMENT
                .rental().inventory().FILM_ID.eq(FILM.FILM_ID))
            .groupBy(
                PAYMENT.rental().customer().CUSTOMER_ID,
                PAYMENT.rental().customer().FIRST_NAME,
                PAYMENT.rental().customer().LAST_NAME)
        ).as("customers")
    )
    .from(FILM)
    .where(FILM.TITLE.like("A%"))
    .orderBy(FILM.TITLE)
    .limit(5)
    .fetch();

You’ll be using var, of course, rather than denoting this insane type but I wanted to denote the type explicitly for the sake of the example.

Note again how implicit joins were really helpful here since we wanted to aggregate all the payments per customer, we can just select from PAYMENT and group by the payment’s PAYMENT.rental().customer(), as well as correlate the subquery by PAYMENT.rental().inventory().FILM_ID without any extra effort.

The executed SQL looks like this, where you can see the generated implicit joins (run it on your PostgreSQL Sakila database!):

select
  film.title,
  (
    select coalesce(
      jsonb_agg(jsonb_build_object(
        'first_name', t.first_name,
        'last_name', t.last_name
      )),
      jsonb_build_array()
    )
    from (
      select alias_78509018.first_name, alias_78509018.last_name
      from (
        film_actor
          join actor as alias_78509018
            on film_actor.actor_id = alias_78509018.actor_id
        )
      where film_actor.film_id = film.film_id
    ) as t
  ) as actors,
  (
    select coalesce(
      jsonb_agg(jsonb_build_object('name', t.name)),
      jsonb_build_array()
    )
    from (
      select alias_130639425.name
      from (
        film_category
          join category as alias_130639425
            on film_category.category_id = 
               alias_130639425.category_id
        )
      where film_category.film_id = film.film_id
    ) as t
  ) as categories,
  (
    select coalesce(
      jsonb_agg(jsonb_build_object(
        'first_name', t.first_name,
        'last_name', t.last_name,
        'payments', t.payments,
        'total', t.total
      )),
      jsonb_build_array()
    )
    from (
      select
        alias_63965917.first_name,
        alias_63965917.last_name,
        jsonb_agg(jsonb_build_object(
          'payment_date', payment.payment_date,
          'amount', payment.amount
        )) as payments,
        sum(payment.amount) as total
      from (
        payment
          join (
            rental as alias_102068213
              join customer as alias_63965917
                on alias_102068213.customer_id = 
                   alias_63965917.customer_id
              join inventory as alias_116526225
                on alias_102068213.inventory_id = 
                   alias_116526225.inventory_id
          )
            on payment.rental_id = alias_102068213.rental_id
        )
      where alias_116526225.film_id = film.film_id
      group by 
        alias_63965917.customer_id, 
        alias_63965917.first_name, 
        alias_63965917.last_name
    ) as t
  ) as customers
from film
where film.title like 'A%'
order by film.title
fetch next 5 rows only

The result, in JSON, now looks something like this:

[
  {
    "title": "ACADEMY DINOSAUR",
    "actors": [
      {
        "first_name": "PENELOPE",
        "last_name": "GUINESS"
      },
      {
        "first_name": "CHRISTIAN",
        "last_name": "GABLE"
      },
      {
        "first_name": "LUCILLE",
        "last_name": "TRACY"
      },
      ...
    ],
    "categories": [{ "name": "Documentary" }],
    "customers": [
      {
        "first_name": "SUSAN",
        "last_name": "WILSON",
        "payments": [
          {
            "payment_date": "2005-07-31T22:08:29",
            "amount": 0.99
          }
        ],
        "total": 0.99
      },
      {
        "first_name": "REBECCA",
        "last_name": "SCOTT",
        "payments": [
          {
            "payment_date": "2005-08-18T18:36:16",
            "amount": 0.99
          }
        ],
        "total": 0.99
      },
      ...

That’s it. Nesting collections in arbitrary ways is completely effortless and intuitive. No N+1, no deduplication. Just declare the results in exactly the form you require in your client.

There is no other way to pull off this complexity with such ease, than letting your RDBMS do the heavy lifting of planning and running such a query, and letting jOOQ do the mapping.

Conclusion

We had this kind of functionality all along. We just never used it, or not enough. Why? Because client APIs did not make it accessible enough. Because RDBMS didn’t agree on syntax enough.

That’s now over. jOOQ uses standard SQL MULTISET syntax in its DSL API, enhances it with the synthetic MULTISET_AGG aggregate function, the way all RDBMS should have implemented it (go Informix, Oracle). We can wait for another 40 years for the other RDBMS to implement this, or we just use jOOQ today.

And, I cannot stress this enough:

  • This is all type safe
  • There is no reflection
  • The nesting is done in the database using SQL (via SQL/XML or SQL/JSON for now)
  • … So, execution planners can optimise your entire query
  • … No extra columns or extra queries or other extra work is performed in the database

This works on all dialects that have either SQL/XML or SQL/JSON support (or both), including the major popular dialects:

  • MySQL
  • Oracle
  • PostgreSQL
  • SQL Server

And it is offered under the usual license terms of jOOQ. So, happy nesting of collections.

Addendum: Using SQL/XML or SQL/JSON directly

You may be tempted to use this everywhere. And you rightfully do so. But beware of this, if your SQL client is consuming XML or JSON directly, there’s no need to use MULTISET. Use jOOQ’s native SQL/XML or SQL/JSON support that was introduced in jOOQ 3.14. That way, you won’t convert from JSON to jOOQ results to JSON, but stream the JSON (or XML) to your frontend directly.

Use IN List Padding to Your JDBC Application to Avoid Cursor Cache Contention Problems

A problem few developers are aware of is the possibility of running into “cursor cache contention” or “execution plan cache contention” problems when using IN lists in SQL. The problem that is described in lengths in previous articles, can be summarised as this.

All of these are distinct SQL queries and need to be parsed / planned / cached as possibly distinct execution plans in RDBMS that have strong plan caches (e.g. Db2, Oracle, SQL Server):

SELECT * FROM t WHERE id IN (?);
SELECT * FROM t WHERE id IN (?, ?);
SELECT * FROM t WHERE id IN (?, ?, ?);
SELECT * FROM t WHERE id IN (?, ?, ?, ?);
SELECT * FROM t WHERE id IN (?, ?, ?, ?, ?);

While this is never a problem on developer machines, this can produce significant problems in production. I’ve seen this take down entire Oracle instances during peak loads. While RDBMS vendors should work on avoiding the severe problems this can cause, you can work around it using a trick that we invented at jOOQ (and Hibernate also has it now):

IN list padding

The trick is very simple. Just “pad” your IN lists to the nearest power of 2, and repeat the last value until the end:

SELECT * FROM t WHERE id IN (?);                      -- Left as it is
SELECT * FROM t WHERE id IN (?, ?);                   -- Left as it is
SELECT * FROM t WHERE id IN (?, ?, ?, ?);             -- Padded 3 to 4
SELECT * FROM t WHERE id IN (?, ?, ?, ?);             -- Left as it is
SELECT * FROM t WHERE id IN (?, ?, ?, ?, ?, ?, ?, ?); -- Padded 5 to 8

It’s really a hack and there are better solutions to avoiding this problem, including using arrays or temporary tables, but your production system may be down and you need a quick fix.

jOOQ has supported IN list padding for years now, since jOOQ 3.9 (late 2016), but with the relatively new parser and the ParsingConnection, you can now also apply this technique to any arbitrary SQL query in your non-jOOQ based system. Here’s a simple example:

// Any arbitrary JDBC Connection is wrapped by jOOQ here and replaced
// by a "ParsingConnection", which is also a JDBC Connection
DSLContext ctx = DSL.using(connection);
ctx.settings().setInListPadding(true);
Connection c = ctx.parsingConnection();

// Your remaining code is left untouched. It is unaware of jOOQ
for (int i = 0; i < 10; i++) {
    try (PreparedStatement s = c.prepareStatement(

        // This alone is reason enough to use jOOQ instead, 
        // but one step at a time :)
        "select 1 from dual where 1 in (" +
            IntStream.rangeClosed(0, i)
                     .mapToObj(x -> "?")
                     .collect(Collectors.joining(", ")) +
        ")")
    ) {
        for (int j = 0; j <= i; j++)
            s.setInt(j + 1, j + 1);

        try (ResultSet rs = s.executeQuery()) {
            while (rs.next())
                System.out.println(rs.getInt(1));
        }
    }
}

The above example just generates and runs 10 queries of this form:

select 1 from dual where 1 in (?)
select 1 from dual where 1 in (?, ?)
select 1 from dual where 1 in (?, ?, ?)
select 1 from dual where 1 in (?, ?, ?, ?)
select 1 from dual where 1 in (?, ?, ?, ?, ?)
select 1 from dual where 1 in (?, ?, ?, ?, ?, ?)
select 1 from dual where 1 in (?, ?, ?, ?, ?, ?, ?)
select 1 from dual where 1 in (?, ?, ?, ?, ?, ?, ?, ?)
select 1 from dual where 1 in (?, ?, ?, ?, ?, ?, ?, ?, ?)
select 1 from dual where 1 in (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)

But that’s not what is being executed. In the DEBUG logs, we can see the following:

Translating from         : select 1 from dual where 1 in (?)
Translating to           : select 1 from dual where 1 in (?)
Translating from         : select 1 from dual where 1 in (?, ?)
Translating to           : select 1 from dual where 1 in (?, ?)
Translating from         : select 1 from dual where 1 in (?, ?, ?)
Translating to           : select 1 from dual where 1 in (?, ?, ?, ?)
Translating from         : select 1 from dual where 1 in (?, ?, ?, ?)
Translating to           : select 1 from dual where 1 in (?, ?, ?, ?)
Translating from         : select 1 from dual where 1 in (?, ?, ?, ?, ?)
Translating to           : select 1 from dual where 1 in (?, ?, ?, ?, ?, ?, ?, ?)
Translating from         : select 1 from dual where 1 in (?, ?, ?, ?, ?, ?)
Translating to           : select 1 from dual where 1 in (?, ?, ?, ?, ?, ?, ?, ?)
Translating from         : select 1 from dual where 1 in (?, ?, ?, ?, ?, ?, ?)
Translating to           : select 1 from dual where 1 in (?, ?, ?, ?, ?, ?, ?, ?)
Translating from         : select 1 from dual where 1 in (?, ?, ?, ?, ?, ?, ?, ?)
Translating to           : select 1 from dual where 1 in (?, ?, ?, ?, ?, ?, ?, ?)
Translating from         : select 1 from dual where 1 in (?, ?, ?, ?, ?, ?, ?, ?, ?)
Translating to           : select 1 from dual where 1 in (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
Translating from         : select 1 from dual where 1 in (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
Translating to           : select 1 from dual where 1 in (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)

And just like that, our legacy application can run in production again, and you’ll have time to fix this more thoroughly.

Conclusion

While jOOQ is mostly an internal DSL for writing type safe, embedded SQL in Java, you can use it for a lot of other things too on any JDBC based application. The above example is using the ParsingConnection that can parse all your SQL statements and translate / transform them to anything else, including other dialects.

Automatically Transform Oracle Style Implicit Joins to ANSI JOIN using jOOQ

While jOOQ is mostly being used as an internal SQL DSL for embedded, dynamic SQL in Java, where it offers the best solution on the market, jOOQ is increasingly also used for one of its secondary features: Its parser. Having been introduced in jOOQ 3.9 primarly for the purpose of being able to parse DDL statements, e.g. to reverse engineer your schema for code generation purposes, we’ve added an increasing amount of features and SQL transformation capabilities that allow for the parser to be used as a standalone product, through the command line interface, our website, or through the ordinary jOOQ API. One feature that has been added to jOOQ 3.14, which is mostly useful to those using jOOQ as a parser, is the capability of transforming old Oracle style implicit joins to ANSI JOIN.

Why avoid “implicit joins”?

The old Oracle style implicit join syntax is supported and properly optimised by most RDBMS vendors. In the past, prior to SQL-92, this is how we used to inner join tables, e.g. when querying the Sakila database:

SELECT *
FROM actor a, film_actor fa, film f
WHERE a.actor_id = fa.actor_id
AND fa.film_id = f.film_id

Granted, the syntax is kind of intuitive. Just declare all the tables you want to fetch data from, and then make sure the proper data is retained only by filtering for matching primary key / foreign key values. Of course, this can go terribly wrong. For the many obvious reasons, e.g. when you forget a join predicate after adding a table. If the query is complex, this may be hard to debug. The solution is ANSI JOIN. Starting from SQL-92 (almost 30 years now!), this is how we join in most RDBMS:

SELECT *
FROM actor a
JOIN film_actor fa ON a.actor_id = fa.actor_id
JOIN film f ON fa.film_id = f.film_id

While it’s still possible to define wrong join predicates, at least it’s no longer possible to forget a predicate, because this is syntactically incorrect (except for MySQL, where, regrettably, the ON clause is optional):

SELECT *
FROM actor a
JOIN film_actor fa -- Syntax error
JOIN film f -- Syntax error

jOOQ’s implicit JOIN

Notice that it is common to refer to the above syntax as “implicit join”, while JPQL and jOOQ recycled the term for another kind of “implicit join”, which is foreign key path based, and even less error prone than the ANSI SQL syntax. With jOOQ, the above query can be written as follows:

ctx.select(
FILM_ACTOR.actor().asterisk(),
FILM_ACTOR.asterisk(),
FILM_ACTOR.film().asterisk())
.from(FILM_ACTOR)
.fetch();

The mere presence of these to-one relationship paths in the query will implicitly add the appropriate LEFT JOIN or INNER JOIN to the FROM clause. This is merely convenience on top of ordinary ANSI JOINs, not a replacement.

Transforming Oracle implicit joins

When you have an old code base that you wish to upgrade and transform all your queries to using ANSI JOIN, use jOOQ for that. You can use jOOQ’s programmatic capabilities (as mentioned before), or the free website https://www.jooq.org/translate. On the website, just pick the “Oracle style to ANSI JOIN” option, place the following SQL on the left: Input

SELECT
a.first_name,
a.last_name,
count(c.category_id)
FROM
actor a,
film_actor fa,
film f,
film_category fc,
category c
WHERE a.actor_id = fa.actor_id
AND fa.film_id = f.film_id
AND fc.category_id = c.category_id
GROUP BY
a.actor_id,
a.first_name,
a.last_name

Output

SELECT
a.first_name,
a.last_name,
count(c.category_id)
FROM actor a
JOIN film_actor fa
ON a.actor_id = fa.actor_id
JOIN film f
ON fa.film_id = f.film_id
CROSS JOIN (
film_category fc
JOIN category c
ON fc.category_id = c.category_id
)
GROUP BY
a.actor_id,
a.first_name,
a.last_name

And… whoopsies. The output correctly displays the resulting, undesired CROSS JOIN because one of the join predicates was missing: Yep, the tool already helped! Let’s fix the input query: Fixed input

SELECT
a.first_name,
a.last_name,
count(c.category_id)
FROM
actor a,
film_actor fa,
film f,
film_category fc,
category c
WHERE a.actor_id = fa.actor_id
AND fa.film_id = f.film_id
AND f.film_id = fc.film_id -- This was missing
AND fc.category_id = c.category_id
GROUP BY
a.actor_id,
a.first_name,
a.last_name

Fixed output

SELECT
a.first_name,
a.last_name,
count(c.category_id)
FROM actor a
JOIN film_actor fa
ON a.actor_id = fa.actor_id
JOIN film f
ON fa.film_id = f.film_id
JOIN film_category fc
ON f.film_id = fc.film_id
JOIN category c
ON fc.category_id = c.category_id
GROUP BY
a.actor_id,
a.first_name,
a.last_name

This also works if you were using Oracle’s arcane outer join syntax using (+) (or SQL Server’s *=, which has been unsupported for a while now). You might have this input: Input

SELECT
a.first_name,
a.last_name,
count(c.category_id)
FROM
actor a,
film_actor fa,
film f,
film_category fc,
category c
WHERE a.actor_id = fa.actor_id(+)
AND fa.film_id = f.film_id(+)
AND f.film_id = fc.film_id(+)
AND fc.category_id(+) = c.category_id
GROUP BY
a.actor_id,
a.first_name,
a.last_name

Producing this output

SELECT
a.first_name,
a.last_name,
count(c.category_id)
FROM actor a
LEFT OUTER JOIN film_actor fa
ON a.actor_id = fa.actor_id
LEFT OUTER JOIN film f
ON fa.film_id = f.film_id
LEFT OUTER JOIN (
film_category fc
RIGHT OUTER JOIN category c
ON fc.category_id = c.category_id
)
ON f.film_id = fc.film_id
GROUP BY
a.actor_id,
a.first_name,
a.last_name

Err, wat. Whoopsies again! One of the (+) symbols was on the wrong side, which is why we got that RIGHT OUTER JOIN. Again, the tool has shown that the old syntax was quite error prone. Let’s fix it. Fixed input

SELECT
a.first_name,
a.last_name,
count(c.category_id)
FROM
actor a,
film_actor fa,
film f,
film_category fc,
category c
WHERE a.actor_id = fa.actor_id(+)
AND fa.film_id = f.film_id(+)
AND f.film_id = fc.film_id(+)
AND fc.category_id = c.category_id(+)
GROUP BY
a.actor_id,
a.first_name,
a.last_name

Fixed output

SELECT
a.first_name,
a.last_name,
count(c.category_id)
FROM actor a
LEFT OUTER JOIN film_actor fa
ON a.actor_id = fa.actor_id
LEFT OUTER JOIN film f
ON fa.film_id = f.film_id
LEFT OUTER JOIN film_category fc
ON f.film_id = fc.film_id
LEFT OUTER JOIN category c
ON fc.category_id = c.category_id
GROUP BY
a.actor_id,
a.first_name,
a.last_name

Conclusion

Play around with it and tell us what you think! https://www.jooq.org/translate

Having “constant” columns in foreign keys

I was asked a very interesting question on Twitter just now:
Can we have “constant” foreign key columns in (PostgreSQL) tables? Luckily, yes, we can. Using a nice standard feature that is “computed columns” or “generated columns” Sometimes, you cannot completely normalise your schema for whatever reason. There may be a case where you have a table with a composite primary key like this:

CREATE TABLE t1 (
  a int,
  b int,
  t1 int,
  PRIMARY KEY (a, b)
)

And in a referencing table t2, you will always reference one of the primary key columns by value, say, 1. Of course, you could create a table t2 with a CHECK constraint ensuring b = 1:

CREATE TABLE t2 (
  a int,
  b int NOT NULL DEFAULT 1 CHECK (b = 1),
  t2 int,
  FOREIGN KEY (a, b) REFERENCES t1
)

But why not use a generated column instead?

CREATE TABLE t2 (
  a int,
  b int GENERATED ALWAYS AS (1) STORED,
  t2 int,
  FOREIGN KEY (a, b) REFERENCES t1
)

In my opinion, this is even more powerful. As of PostgreSQL 12, only STORED is supported (meaning the value is stored on disk), when in this case VIRTUAL would be even better (meaning the value is produced only when reading the row). Inserting some test data:

INSERT INTO t1 (a, b, t1) 
VALUES(1, 1, 1), (1, 2, 2), (2, 1, 3);

INSERT INTO t2 (a, t2) 
VALUES (1, 11), (2, 12);

SELECT * 
FROM t1
NATURAL LEFT JOIN t2

Produces the expected result. We can only insert (b = 1) into t2:
a|b|t1|t2|
-|-|--|--|
1|1| 1|11|
2|1| 3|12|
1|2| 2|  |
A nice trick to keep up one’s sleeve. Computed or generated columns are available in a variety of RDBMS, including at least:
  • Db2
  • MySQL
  • Oracle
  • PostgreSQL
  • SQL Server

Using SQL Server FOR XML and FOR JSON Syntax on Other RDBMS With jOOQ

SQL Server supports transforming flat tabular SQL result sets into hierarchical structures by convention using the convenient FOR XML or FOR JSON syntaxes. This is really convenient and less verbose than the standard SQL/XML or SQL/JSON APIs – although the standard ones are more powerful. In this blog post, I’d like to show a few core features of the SQL Server syntax, and what they correspond to in standard SQL. jOOQ 3.14 will support both SQL Server’s syntax and the standard syntax, and will be able to translate from one to the other, such that you can use SQL Server syntax also on Db2, MariaDB, MySQL, Oracle, PostgreSQL. You can play around with the current state of development on our website here. As always, using the Sakila database, here’s a simple example as a teaser:

-- SQL Server
SELECT a.first_name, a.last_name, f.title
FROM actor a
JOIN film_actor fa ON a.actor_id = fa.actor_id
JOIN film f ON fa.film_id = f.film_id
FOR XML RAW;

-- Db2, Oracle, PostgreSQL
SELECT xmlagg(xmlelement(
  NAME row,
  xmlattributes(
    t.first_name AS first_name,
    t.last_name AS last_name,
    t.title AS title
  )
))
FROM (
  -- Original query here
  SELECT a.first_name, a.last_name, f.title
  FROM actor a
  JOIN film_actor fa ON a.actor_id = fa.actor_id
  JOIN film f ON fa.film_id = f.film_id
) AS t

Producing in both cases something like:
<row first_name="PENELOPE" last_name="GUINESS" title="OKLAHOMA JUMANJI"/>
<row first_name="PENELOPE" last_name="GUINESS" title="RULES HUMAN"/>
<row first_name="PENELOPE" last_name="GUINESS" title="SPLASH GUMP"/>
<row first_name="PENELOPE" last_name="GUINESS" title="VERTIGO NORTHWEST"/>
<row first_name="PENELOPE" last_name="GUINESS" title="WESTWARD SEABISCUIT"/>
<row first_name="PENELOPE" last_name="GUINESS" title="WIZARD COLDBLOODED"/>
<row first_name="NICK" last_name="WAHLBERG" title="ADAPTATION HOLES"/>
<row first_name="NICK" last_name="WAHLBERG" title="APACHE DIVINE"/>

FOR XML and FOR JSON concepts

As could be seen in the above teaser, the SQL Server syntax is far less verbose and concise, and it seems to produce a reasonable default behaviour, where the Db2, Oracle, PostgreSQL (and SQL Standard) SQL/XML APIs are more verbose, but also more powerful. For example, it is possible to map column a to attribute x and column b to a nested XML element y very easily. The advantages of both approaches are clear. SQL Server’s approach is much more usable in the general case. But what is the general case? Let’s summarise a few key parallels between SQL result sets, and XML/JSON data structures: Tables are XML elements or JSON arrays Tables (i.e. sets of data) are not a foreign concept to both XML and JSON documents. The most natural way to represent a set of data in XML is a set of elements using the same element name, optionally wrapped by a wrapper element. For example:
<!-- With wrapper element -->
<films>
  <film title="OKLAHOMA JUMANJI"/>
  <film title="RULES HUMAN"/>
  <film title="SPLASH GUMP"/>
</films>

<!-- Without wrapper element -->
<film title="OKLAHOMA JUMANJI"/>
<film title="RULES HUMAN"/>
<film title="SPLASH GUMP"/>
The distinction of whether a wrapper element is added is mostly significant when nesting data. With JSON, the obvious choice of data structure to represent a table is an array. For example:
[
  {"title": "OKLAHOMA JUMANJI"},
  {"title": "RULES HUMAN"},
  {"title": "SPLASH GUMP"}
]
Rows are XML elements or JSON objects As we’ve already seen above, a SQL row is represented in XML using an element.
<film title="OKLAHOMA JUMANJI"/>
The question is only what the element name should be. It can usually be any of:
  • A standard name, such as “row”
  • The name of the table the row stems from
  • A custom name
In JSON, it is an object.
{"title": "OKLAHOMA JUMANJI"}
Unlike in XML, there is no such thing as an element name, so the row is “anonymous”. The row type is defined by what table / array the JSON object is contained in. Column values are XML elements or attributes, or JSON attributes We have a bit more choices of how to represent SQL column values in XML. Mainly two choices:
  • Represent values as attributes
  • Represent values as elements
Scalar values can easily be represented as attributes. If a value needs further nesting (e.g. an array, user defined type, etc.), then elements are a better choice. In most cases, the choice is not relevant, so we can pick both:
<!-- Using attributes -->
<film film_id="635" title="OKLAHOMA JUMANJI"/>

<!-- Using elements from table and column names -->
<film>
  <film_id>635</film_id>
  <title>OKLAHOMA JUMANJI</title>
</film>

<!-- Using standard element names
<row>
  <value name="film_id" value="635"/>
  <value name="title" value="OKLAHOMA JUMANJI"/>
</row>
There are a few other reasonable default options for the representation of a column value in XML. In JSON, on the other hand, there are two main reasonable approaches. In most cases, an object will be chosen, where column values are identified by column name. But just like SQL records are a mixture between “structs” and “tuples”, we could imagine a representation that maps column values to array indexes as well:
// Using objects
{"film_id": 635, "title": "OKLAHOMA JUMANJI"}

// Using arrays
[635, "OKLAHOMA JUMANJI"]
GROUP BY and ORDER BY can be seen as a way to nest data So far, all data was represented in a flat way, just like the SQL table. There was some nesting when wrapping XML elements of JSON arrays in some wrapper element or object, or when representing XML data with more elements rather than attributes, but the data was still always tabular. Very often, we want to consume data in a hierarchical form, though. An actor played in films, so we’d like to group the films by actor, rather than repeating the actor information for every film. In general, operations like GROUP BY or ORDER BY serve this purpose. GROUP BY allows for aggregating all data into nested data structures per group (e.g. into strings, arrays, XML elements, JSON arrays, JSON objects). ORDER BY does the same, “visually” – perhaps a bit less formally. When we look at this set of XML elements, we can see visually that they’re “grouped” (i.e. ordered) by actor:
<row first_name="PENELOPE" last_name="GUINESS" title="OKLAHOMA JUMANJI"/>
<row first_name="PENELOPE" last_name="GUINESS" title="RULES HUMAN"/>
<row first_name="PENELOPE" last_name="GUINESS" title="SPLASH GUMP"/>
<row first_name="PENELOPE" last_name="GUINESS" title="VERTIGO NORTHWEST"/>
<row first_name="PENELOPE" last_name="GUINESS" title="WESTWARD SEABISCUIT"/>
<row first_name="PENELOPE" last_name="GUINESS" title="WIZARD COLDBLOODED"/>
<row first_name="NICK" last_name="WAHLBERG" title="ADAPTATION HOLES"/>
<row first_name="NICK" last_name="WAHLBERG" title="APACHE DIVINE"/>
SQL Server supports such grouping in at least two ways:
  • Implicitly by convention, using ORDER BY
  • Explicity by creating correlated subqueries
The implicit approach could transform the above flat representation into something like this:
<a first_name="PENELOPE" last_name="GUINESS">
    <f title="OKLAHOMA JUMANJI"/>
    <f title="RULES HUMAN"/>
    <f title="SPLASH GUMP"/>
    <f title="VERTIGO NORTHWEST"/>
    <f title="WESTWARD SEABISCUIT"/>
    <f title="WIZARD COLDBLOODED"/>
</a>
<a first_name="NICK" last_name="WAHLBERG">
    <f title="ADAPTATION HOLES"/>
    <f title="APACHE DIVINE"/>
</a>
… where “a” and “f” are the table names in the query (actor a and film f).

How do FOR XML and FOR JSON work in detail?

There are several features that can be combined in SQL Server. The complete picture can be seen from the docs. We’ll omit a few features in this blog post here.
  • The transformation algorithm RAW (flat results, only in XML), AUTO (hierarchical, automatic results), PATH (hierarchical, explicit results)
  • The “root” name, which corresponds to an XML wrapper element, or a JSON wrapper object
  • XML only: Whether values should be placed in ELEMENTS or attributes
  • JSON only: INCLUDE_NULL_VALUES specifies whether NULL values are explicit, or implicit (absent from the JSON object).
  • JSON only: WITHOUT_ARRAY_WRAPPER specifies whether the set of JSON objects should be listed as a JSON array, or a comma separated list of objects (which could be combined with other queries)
This is not complete, there are more flags and features, but instead of discussing them in theory, let’s look at a few examples: FOR XML RAW Producing flat results with attributes for values

-- SQL Server
SELECT a.first_name, a.last_name, f.title
FROM actor a
JOIN film_actor fa ON a.actor_id = fa.actor_id
JOIN film f ON fa.film_id = f.film_id
ORDER BY 1, 2, 3
FOR XML RAW;

-- Standard SQL
SELECT xmlagg(xmlelement(
  NAME row,
  xmlattributes(
    t.first_name AS first_name,
    t.last_name AS last_name,
    t.title AS title
  )
))
FROM (
  SELECT a.first_name, a.last_name, f.title
  FROM actor a
  JOIN film_actor fa ON a.actor_id = fa.actor_id
  JOIN film f ON fa.film_id = f.film_id
  ORDER BY 1, 2, 3
) AS t

This produces
<row first_name="NICK" last_name="WAHLBERG" title="SMILE EARRING"/>
<row first_name="NICK" last_name="WAHLBERG" title="WARDROBE PHANTOM"/>
<row first_name="PENELOPE" last_name="GUINESS" title="ACADEMY DINOSAUR"/>
<row first_name="PENELOPE" last_name="GUINESS" title="ANACONDA CONFESSIONS"/>
FOR XML RAW, ROOT Producing flat results with attributes for values, and a root element to wrap the listed elements

-- SQL Server
SELECT a.first_name, a.last_name, f.title
FROM actor a
JOIN film_actor fa ON a.actor_id = fa.actor_id
JOIN film f ON fa.film_id = f.film_id
ORDER BY 1, 2, 3
FOR XML RAW, ROOT('rows');

-- Standard SQL
SELECT xmlelement(
  NAME rows,
  xmlagg(xmlelement(
    NAME row,
    xmlattributes(
      t.first_name AS first_name,
      t.last_name AS last_name,
      t.title AS title
    )
  ))
)
FROM (
  SELECT a.first_name, a.last_name, f.title
  FROM actor a
  JOIN film_actor fa ON a.actor_id = fa.actor_id
  JOIN film f ON fa.film_id = f.film_id
  ORDER BY 1, 2, 3
) AS t

This produces
<rows>
  <row first_name="NICK" last_name="WAHLBERG" title="SMILE EARRING"/>
  <row first_name="NICK" last_name="WAHLBERG" title="WARDROBE PHANTOM"/>
  <row first_name="PENELOPE" last_name="GUINESS" title="ACADEMY DINOSAUR"/>
  <row first_name="PENELOPE" last_name="GUINESS" title="ANACONDA CONFESSIONS"/>
</rows>
FOR XML RAW, ELEMENTS Producing flat results with elements for values.

-- SQL Server
SELECT a.first_name, a.last_name, f.title
FROM actor a
JOIN film_actor fa ON a.actor_id = fa.actor_id
JOIN film f ON fa.film_id = f.film_id
ORDER BY 1, 2, 3
FOR XML RAW, ELEMENTS;

-- Standard SQL
SELECT xmlagg(xmlelement(
  NAME row,
  xmlelement(
    NAME first_name,
    first_name
  ),
  xmlelement(
    NAME last_name,
    last_name
  ),
  xmlelement(
    NAME title,
    title
  )
))
FROM (
  SELECT a.first_name, a.last_name, f.title
  FROM actor a
  JOIN film_actor fa ON a.actor_id = fa.actor_id
  JOIN film f ON fa.film_id = f.film_id
  ORDER BY 1, 2, 3
  FOR XML RAW, ELEMENTS
) AS t

This produces
<row>
    <first_name>NICK</first_name>
    <last_name>WAHLBERG</last_name>
    <title>SMILE EARRING</title>
</row>
<row>
    <first_name>NICK</first_name>
    <last_name>WAHLBERG</last_name>
    <title>WARDROBE PHANTOM</title>
</row>
<row>
    <first_name>PENELOPE</first_name>
    <last_name>GUINESS</last_name>
    <title>ACADEMY DINOSAUR</title>
</row>
<row>
    <first_name>PENELOPE</first_name>
    <last_name>GUINESS</last_name>
    <title>ANACONDA CONFESSIONS</title>
</row>
This could also be combined with ROOT, which we’re omitting for brevity. FOR XML/JSON AUTO This approach derives results completely automatically from your query structure. Mainly:
  • The SELECT clause defines in what order XML or JSON data is nested.
  • The FROM clause defines the table names (via aliasing), which are translated to XML element or JSON object attribute names.
  • The ORDER BY clause produces the “grouping”, which is translated to nesting XML elements or JSON objects.

-- SQL Server
SELECT a.first_name, a.last_name, f.title
FROM actor a
JOIN film_actor fa ON a.actor_id = fa.actor_id
JOIN film f ON fa.film_id = f.film_id
ORDER BY 1, 2, 3
FOR XML AUTO;

-- Standard SQL
SELECT xmlagg(e)
FROM (
  SELECT xmlelement(
    NAME a,
    xmlattributes(
      t.first_name AS first_name,
      t.last_name AS last_name
    ),
    xmlagg(xmlelement(
      NAME f,
      xmlattributes(t.title AS title)
    ))
  ) AS e
  FROM (
    SELECT a.first_name, a.last_name, f.title
    FROM actor a
    JOIN film_actor fa ON a.actor_id = fa.actor_id
    JOIN film f ON fa.film_id = f.film_id
    ORDER BY 1, 2, 3
  ) AS t
  GROUP BY
    first_name,
    last_name
) AS t

Notice how this emulation requires two steps of XMLAGG with GROUP BY. It gets more hairy with more tables being joined and projected! I won’t add more complex examples here, but try it online! This produces
<a first_name="NICK" last_name="WAHLBERG">
    <f title="SMILE EARRING"/>
    <f title="WARDROBE PHANTOM"/>
</a>
<a first_name="PENELOPE" last_name="GUINESS">
    <f title="ACADEMY DINOSAUR"/>
    <f title="ANACONDA CONFESSIONS"/>
</a>
Let’s try the same thing again with JSON:

-- SQL Server
SELECT a.first_name, a.last_name, f.title
FROM actor a
JOIN film_actor fa ON a.actor_id = fa.actor_id
JOIN film f ON fa.film_id = f.film_id
ORDER BY 1, 2, 3
FOR JSON AUTO;

-- Standard SQL
SELECT json_arrayagg(e)
FROM (
  SELECT JSON_OBJECT(
    KEY 'FIRST_NAME' VALUE first_name,
    KEY 'LAST_NAME' VALUE last_name,
    KEY 'F' VALUE JSON_ARRAYAGG(JSON_OBJECT(
      KEY 'TITLE' VALUE title
      ABSENT ON NULL
    ))
    ABSENT ON NULL
  ) e
  FROM (
    SELECT a.first_name, a.last_name, f.title
    FROM actor a
    JOIN film_actor fa ON a.actor_id = fa.actor_id
    JOIN film f ON fa.film_id = f.film_id
    ORDER BY 1, 2, 3
  ) t
  GROUP BY
    first_name,
    last_name
) t

The result being:
[
    {
        "first_name": "NICK",
        "last_name": "WAHLBERG",
        "f": [
            {
                "title": "SMILE EARRING"
            },
            {
                "title": "WARDROBE PHANTOM"
            }
        ]
    },
    {
        "first_name": "PENELOPE",
        "last_name": "GUINESS",
        "f": [
            {
                "title": "ACADEMY DINOSAUR"
            },
            {
                "title": "ANACONDA CONFESSIONS"
            }
        ]
    }
]
FOR XML/JSON AUTO, ROOT Like before, we could wrap this in a root XML element or a root JSON object if need be.

-- SQL Server
SELECT a.first_name, a.last_name, f.title
FROM actor a
JOIN film_actor fa ON a.actor_id = fa.actor_id
JOIN film f ON fa.film_id = f.film_id
ORDER BY 1, 2, 3
FOR XML AUTO, ROOT;

-- Standard SQL
SELECT xmlelement(
  NAME join,
  xmlagg(e)
)
FROM (
  SELECT xmlelement(
    NAME a,
    xmlattributes(
      t.first_name AS first_name,
      t.last_name AS last_name
    ),
    xmlagg(xmlelement(
      NAME f,
      xmlattributes(t.title AS title)
    ))
  ) e
  FROM (
    SELECT a.first_name, a.last_name, f.title
    FROM actor a
    JOIN film_actor fa ON a.actor_id = fa.actor_id
    JOIN film f ON fa.film_id = f.film_id
    ORDER BY 1, 2, 3
  ) t
  GROUP BY
    first_name,
    last_name
) t

This does the same thing as before, but just wraps the previous root XMLAGG() element in another XMLELEMENT() function call. This produces
<root>
    <a first_name="NICK" last_name="WAHLBERG">
        <f title="SMILE EARRING"/>
        <f title="WARDROBE PHANTOM"/>
    </a>
    <a first_name="PENELOPE" last_name="GUINESS">
        <f title="ACADEMY DINOSAUR"/>
        <f title="ANACONDA CONFESSIONS"/>
    </a>
</root>
Let’s try the same thing again with JSON:

-- SQL Server
SELECT a.first_name, a.last_name, f.title
FROM actor a
JOIN film_actor fa ON a.actor_id = fa.actor_id
JOIN film f ON fa.film_id = f.film_id
ORDER BY 1, 2, 3
FOR JSON AUTO, ROOT;

-- Standard SQL
SELECT JSON_OBJECT(KEY 'a' VALUE json_arrayagg(e))
FROM (
  SELECT JSON_OBJECT(
    KEY 'FIRST_NAME' VALUE first_name,
    KEY 'LAST_NAME' VALUE last_name,
    KEY 'F' VALUE JSON_ARRAY_AGG(JSON_OBJECT(
      KEY 'TITLE' VALUE title
      ABSENT ON NULL
    ))
    ABSENT ON NULL
  ) e
  FROM (
    SELECT a.first_name, a.last_name, f.title
    FROM actor a
    JOIN film_actor fa ON a.actor_id = fa.actor_id
    JOIN film f ON fa.film_id = f.film_id
    ORDER BY 1, 2, 3
  ) t
  GROUP BY
    first_name,
    last_name
) t

The result being:
{
    "a": [
        {
            "first_name": "NICK",
            "last_name": "WAHLBERG",
            "f": [
                {
                    "title": "SMILE EARRING"
                },
                {
                    "title": "WARDROBE PHANTOM"
                }
            ]
        },
        {
            "first_name": "PENELOPE",
            "last_name": "GUINESS",
            "f": [
                {
                    "title": "ACADEMY DINOSAUR"
                },
                {
                    "title": "ANACONDA CONFESSIONS"
                }
            ]
        }
    ]
}
FOR XML AUTO, ELEMENTS Like before, instead of producing attributes, we might decide to produce elements instead (in XML only):

-- SQL Server
SELECT a.first_name, a.last_name, f.title
FROM actor a
JOIN film_actor fa ON a.actor_id = fa.actor_id
JOIN film f ON fa.film_id = f.film_id
ORDER BY 1, 2, 3
FOR XML AUTO, ELEMENTS;

-- Standard SQL
SELECT xmlagg(e)
FROM (
  SELECT xmlelement(
    NAME a,
    xmlelement(
      NAME first_name,
      first_name
    ),
    xmlelement(
      NAME last_name,
      last_name
    ),
    xmlagg(xmlelement(
      NAME f,
      xmlelement(
        NAME title,
        title
      )
    ))
  ) e
  FROM (
    SELECT a.first_name, a.last_name, f.title
    FROM actor a
    JOIN film_actor fa ON a.actor_id = fa.actor_id
    JOIN film f ON fa.film_id = f.film_id
    ORDER BY 1, 2, 3
  ) t
  GROUP BY
    first_name,
    last_name
) t

Not much has changed, except for the fact that a set of XMLELEMENT() calls are made, rather than XMLATTRIBUTES() callse. This produces
<a>
    <first_name>NICK</first_name>
    <last_name>WAHLBERG</last_name>
    <f>
        <title>SMILE EARRING</title>
    </f>
    <f>
        <title>WARDROBE PHANTOM</title>
    </f>
</a>
<a>
    <first_name>PENELOPE</first_name>
    <last_name>GUINESS</last_name>
    <f>
        <title>ACADEMY DINOSAUR</title>
    </f>
    <f>
        <title>ANACONDA CONFESSIONS</title>
    </f>
</a>
FOR XML/JSON PATH The PATH strategy is my personal favourite. It is used to create nested XML or JSON path structures more explicitly, and also allows for additional nesting levels when grouping projections together. This is best shown by example. Notice, how I’m now using aliases for my columns, and the alias looks like an XPath expression using '/' (slashes):

-- SQL Server
SELECT 
  a.first_name AS [author/first_name], 
  a.last_name AS [author/last_name], 
  f.title
FROM actor a
JOIN film_actor fa ON a.actor_id = fa.actor_id
JOIN film f ON fa.film_id = f.film_id
ORDER BY 1, 2, 3
FOR XML PATH;

-- Standard SQL
SELECT xmlagg(xmlelement(
  NAME row,
  xmlelement(
    NAME author,
    xmlelement(
      NAME first_name,
      "author/first_name"
    ),
    xmlelement(
      NAME last_name,
      "author/last_name"
    )
  ),
  xmlelement(
    NAME title,
    title
  )
))
FROM (
  SELECT 
    a.first_name AS "author/first_name", 
    a.last_name AS "author/last_name", 
    f.title
  FROM actor a
  JOIN film_actor fa ON a.actor_id = fa.actor_id
  JOIN film f ON fa.film_id = f.film_id
  ORDER BY 1, 2, 3
) t

Check out how by convention, we’re now getting an additional level of nesting for author related columns under the row/author element:
<row>
    <author>
        <first_name>NICK</first_name>
        <last_name>WAHLBERG</last_name>
    </author>
    <title>SMILE EARRING</title>
</row>
<row>
    <author>
        <first_name>NICK</first_name>
        <last_name>WAHLBERG</last_name>
    </author>
    <title>WARDROBE PHANTOM</title>
</row>
<row>
    <author>
        <first_name>PENELOPE</first_name>
        <last_name>GUINESS</last_name>
    </author>
    <title>ACADEMY DINOSAUR</title>
</row>
<row>
    <author>
        <first_name>PENELOPE</first_name>
        <last_name>GUINESS</last_name>
    </author>
    <title>ANACONDA CONFESSIONS</title>
</row>
This is really neat! The SQL Server syntax is definitely much more convenient for this common use-case. Let’s try the same thing again with JSON. The only thing we change is we now use a JSON-path-ish syntax using dots ('.') rather than slashes ('/'):

-- SQL Server
SELECT 
  a.first_name AS [author.first_name], 
  a.last_name AS [author.last_name], 
  f.title
FROM actor a
JOIN film_actor fa ON a.actor_id = fa.actor_id
JOIN film f ON fa.film_id = f.film_id
ORDER BY 1, 2, 3
FOR JSON PATH;

-- Standard SQL
SELECT JSON_ARRAYAGG(JSON_OBJECT(
  KEY 'author' VALUE JSON_OBJECT(
    KEY 'first_name' VALUE author.first_name,
    KEY 'last_name' VALUE author.last_name
  ),
  KEY 'TITLE' VALUE title
  ABSENT ON NULL
))
FROM (
  SELECT 
    a.first_name AS "author.first_name", 
    a.last_name AS "author.last_name", 
    f.title
  FROM actor a
  JOIN film_actor fa ON a.actor_id = fa.actor_id
  JOIN film f ON fa.film_id = f.film_id
  ORDER BY 1, 2, 3
) t

The result being (again, with nested objects):
[
    {
        "author": {
            "first_name": "NICK",
            "last_name": "WAHLBERG"
        },
        "title": "SMILE EARRING"
    },
    {
        "author": {
            "first_name": "NICK",
            "last_name": "WAHLBERG"
        },
        "title": "WARDROBE PHANTOM"
    },
    {
        "author": {
            "first_name": "PENELOPE",
            "last_name": "GUINESS"
        },
        "title": "ACADEMY DINOSAUR"
    },
    {
        "author": {
            "first_name": "PENELOPE",
            "last_name": "GUINESS"
        },
        "title": "ANACONDA CONFESSIONS"
    }
]
For more sophisticated nesting, including nesting of collections, a correlated subquery is needed in SQL Server, also with a FOR XML or FOR JSON syntax.

Conclusion

XML and JSON are popular document formats outside and inside of the database. SQL Server has some of the most conventient syntax for most cases, while standard SQL supports much more basic, and thus more powerful constructs. In standard SQL, almost any kind of XML or JSON projection is possible, and with XMLTABLE() and JSON_TABLE(), the documents can be transformed back to SQL tables, as well. In many applications, using these XML or JSON features natively would lead to much less boilerplate code, as many applications do not need middleware between the database and some client, just to transform data between formats. Most ORMs don’t expose this functionality for a variety of reasons, the main one being that the devil is in the details. While both XML and JSON are nicely standardised, the implementations differ greatly:
  • The SQL/XML standard is implemented mostly by DB2, Oracle, and PostgreSQL. Many dialects offer some XML capabilities, but not as impressive as the standard and the previous three. SQL Server has FOR XML which is very powerful for standard XML serialisations, but may be a bit difficult to use for edge cases
  • The SQL/JSON standard was added late and is implemented again to large extents by DB2 and Oracle, but inceasingly also by MariaDB and MySQL. PostgreSQL (and by consequence, compatible dialects, like CockroachDB) had their own proprietary functions and APIs, which are not compatible with the standard. And again, SQL Server has FOR JSON which works well for standard serialisations, but a bit less well for edge cases
These technologies are poorly adopted in clients because of the many subtle differences. jOOQ has been leveling out these minor differences for many years without hiding the core functionality. SQL/XML and SQL/JSON are perfect use-cases for jOOQ 3.14 (due in Q2 2020), which now allows for using both the standard SQL/XML and SQL/JSON syntaxes as well as the SQL Server FOR XML and FOR JSON syntax in the jOOQ Professional and Enterprise Editions. Before jOOQ 3.14 is out, you can already play with the current functionality on our website: https://www.jooq.org/translate

What’s Faster? COUNT(*) or COUNT(1)?

One of the biggest and undead myths in SQL is that COUNT(*) is faster than COUNT(1). Or was it that COUNT(1) is faster than COUNT(*)? Impossible to remember, because there’s really no reason at all why one should be faster than the other. But is the myth justified? Let’s measure!

How does COUNT(…) work?

But first, let’s look into some theory. The two ways to count things are not exactly the same thing. Why?
  • COUNT(*) counts all the tuples in a group
  • COUNT(<expr>) counts all the tuples in a group for which <expr> evaluates to something that IS NOT NULL
This distinction can be quite useful. Most of the time, we’ll simply COUNT(*) for convenience, but there are (at least) two cases where we don’t want that, for example: When outer joining Imagine that in the Sakila database, we have some actors that did not play in any films. Making sure such an actor actually exists:

INSERT INTO actor (actor_id, first_name, last_name)
VALUES (201, 'SUSAN', 'DAVIS');

When inner joining, we might write the following (using PostgreSQL syntax):

SELECT actor_id, a.first_name, a.last_name, count(*) AS c
FROM actor AS a
JOIN film_actor AS fa USING (actor_id)
JOIN film AS f USING (film_id)
GROUP BY actor_id
ORDER BY c ASC, actor_id ASC;

And we won’t get the newly added SUSAN DAVIS, because of the nature of inner join:
actor_id|first_name |last_name   |c |
--------|-----------|------------|--|
     148|EMILY      |DEE         |14|
      35|JUDY       |DEAN        |15|
     199|JULIA      |FAWCETT     |15|
     186|JULIA      |ZELLWEGER   |16|
      31|SISSY      |SOBIESKI    |18|
      71|ADAM       |GRANT       |18|
       1|PENELOPE   |GUINESS     |19|
      30|SANDRA     |PECK        |19|
So we might change our query to use LEFT JOIN instead

SELECT actor_id, a.first_name, a.last_name, count(*) AS c
FROM actor AS a
LEFT JOIN film_actor AS fa USING (actor_id)
LEFT JOIN film AS f USING (film_id)
GROUP BY actor_id
ORDER BY c ASC, actor_id ASC;

There she is now, but oops, wrong count! She doesn’t have any films, which we have proven before with the INNER JOIN query. Yet we get 1:
actor_id|first_name |last_name   |c |
--------|-----------|------------|--|
     201|SUSAN      |DAVIS       | 1|
     148|EMILY      |DEE         |14|
      35|JUDY       |DEAN        |15|
     199|JULIA      |FAWCETT     |15|
     186|JULIA      |ZELLWEGER   |16|
      31|SISSY      |SOBIESKI    |18|
      71|ADAM       |GRANT       |18|
       1|PENELOPE   |GUINESS     |19|
      30|SANDRA     |PECK        |19|
Her COUNT(*) value is 1, because we do get 1 film tuple for her in the group, with all columns being NULL. The solution is to count the FILM_ID instead, which cannot be NULL in the table (being a primary key), but only because of the LEFT JOIN:

SELECT actor_id, a.first_name, a.last_name, count(film_id) AS c
FROM actor AS a
LEFT JOIN film_actor AS fa USING (actor_id)
LEFT JOIN film AS f USING (film_id)
GROUP BY actor_id
ORDER BY c ASC, actor_id ASC;

Notice, we could count other things than the primary key, but with the primary key, we’re quite certain we don’t get any other “accidental” nulls in our groups, which we did not want to exclude from the count value. Now, we’re getting the correct result:
actor_id|first_name |last_name   |c |
--------|-----------|------------|--|
     201|SUSAN      |DAVIS       | 0|
     148|EMILY      |DEE         |14|
      35|JUDY       |DEAN        |15|
     199|JULIA      |FAWCETT     |15|
     186|JULIA      |ZELLWEGER   |16|
      31|SISSY      |SOBIESKI    |18|
      71|ADAM       |GRANT       |18|
       1|PENELOPE   |GUINESS     |19|
      30|SANDRA     |PECK        |19|
When counting subsets of a group An even more powerful application of counting only non-null evaluations of an expression is counting only subsets of a group. We’ve already blogged about this technique in our previous post about aggregating several expressions in one single query. For example, counting in a single query:
  • All actors
  • Actors with their first_name starting with A
  • Actors with their first_name ending with A
  • Actors with their first_name containing A
In SQL:

SELECT 
  count(*),
  count(CASE WHEN first_name LIKE 'A%' THEN 1 END),
  count(CASE WHEN first_name LIKE '%A' THEN 1 END),
  count(CASE WHEN first_name LIKE '%A%' THEN 1 END)
FROM actor;

This yields:
count|count|count|count|
-----|-----|-----|-----|
  201|   13|   30|  105|
This is very useful when pivoting data sets (see also Oracle/SQL Server PIVOT clause). Notice that PostgreSQL supports the SQL standard FILTER clause for this, which is more convenient and more readable. The above query can be written like this, in PostgreSQL:

SELECT 
  count(*),
  count(*) FILTER (WHERE first_name LIKE 'A%'),
  count(*) FILTER (WHERE first_name LIKE '%A'),
  count(*) FILTER (WHERE first_name LIKE '%A%')
FROM actor;

Back to COUNT(*) vs COUNT(1)

Now that we know the theory behind these COUNT expressions, what’s the difference between COUNT(*) and COUNT(1). There is none, effectively. The 1 expression in COUNT(1) evaluates a constant expression for each row in the group, and it can be proven that this constant expression will never evaluate to NULL, so effectively, we’re running COUNT(*), counting ALL the rows in the group again. There should be no difference, and parsers / optimisers should be able to recognise this and not do the extra work of checking every expression evaluation for NULL-ness. I recently saw this discussion on Twitter, though, where Vik Fearing looked up the PostgreSQL sources, showing that PostgreSQL does do the extra work instead of optimising this:
So, I was curious to see if it mattered. I ran a benchmark on the 4 most popular RDBMS, with these results:
  • MySQL: Doesn’t matter. Sometimes COUNT(1) was faster, sometimes COUNT(*) was faster, so all differences were only benchmark artifacts
  • Oracle: Doesn’t matter. Like MySQL
  • PostgreSQL: Does matter (!). COUNT(*) was consistently faster by around 10% on 1M rows, that’s much more than I had expected
  • SQL Server: Doesn’t matter. Like MySQL
The benchmark code can be found in the following gists: The results are below. Each benchmark run repeated SELECT COUNT(*) FROM t or SELECT COUNT(1) FROM t 100 times on a 1M row table, and then the benchmark was repeated 5 times to mitigate any warmup penalties and be fair with respect to caching. The times displayed are relative to the fastest run per database product. This removes any distraction that may be caused by interpreting actual execution times as we do not want to compare database products against each other. The database versions I’ve used are:
  • MySQL 8.0.16 (in Docker)
  • Oracle 18c XE (in Docker)
  • PostgreSQL 11.3 (in Docker)
  • SQL Server 2017 Express (in Windows)
MySQL No relevant difference, nor a clear winner:
RUN     STMT    RELATIVE_TIME
-----------------------------
0	1	1.0079
0	2	1.0212
1	1	1.0229
1	2	1.0256
2	1	1.0009
2	2	1.0031
3	1	1.0291
3	2	1.0256
4	1	1.0618
4	2	1.0000
Oracle No relevant difference, nor a clear winner
Run 1, Statement 1 : 1.06874
Run 1, Statement 2 : 1.01982
Run 2, Statement 1 : 1.09175
Run 2, Statement 2 : 1.0301
Run 3, Statement 1 : 1.00308
Run 3, Statement 2 : 1.02499
Run 4, Statement 1 : 1.02503
Run 4, Statement 2 : 1
Run 5, Statement 1 : 1.01259
Run 5, Statement 2 : 1.05828
PostgreSQL A significant, consistent difference of almost 10%:
RUN 1, Statement 1: 1.00134
RUN 1, Statement 2: 1.09538
RUN 2, Statement 1: 1.00190
RUN 2, Statement 2: 1.09115
RUN 3, Statement 1: 1.00000
RUN 3, Statement 2: 1.09858
RUN 4, Statement 1: 1.00266
RUN 4, Statement 2: 1.09260
RUN 5, Statement 1: 1.00454
RUN 5, Statement 2: 1.09694
Again, I’m surprised by the order of magnitude of this difference. I would have expected it to be less. Curious to hear about your own results in the comments, or further ideas why this is so significant in PostgreSQL. SQL Server No relevant difference, nor a clear winner
Run 1, Statement 1: 1.00442
Run 1, Statement 2: 1.00702
Run 2, Statement 1: 1.00468
Run 2, Statement 2: 1.00000
Run 3, Statement 1: 1.00208
Run 3, Statement 2: 1.00624
Run 4, Statement 1: 1.00780
Run 4, Statement 2: 1.00364
Run 5, Statement 1: 1.00468
Run 5, Statement 2: 1.00702

Conclusion

As it is now in 2019, given the database versions mentioned above, unfortunately, there is a significant difference between COUNT(*) and COUNT(1) in PostgreSQL. Luckily (and this is rare in SQL), all the other dialects don’t care and thus, consistently using COUNT(*), rather than COUNT(1) is a slightly better choice for ALL measured database products from this article. Do note that the benchmark only tried a very simple query! The results might be different when using joins, unions, or any other SQL constructs, or in other edge cases, e.g. when using COUNT() in HAVING or ORDER BY or with window functions, etc. In any case, there shouldn’t be any difference, and I’m sure that a future PostgreSQL version will optimise the constant expression in the COUNT(<expr>) aggregate function directly in the parser to avoid the extra work. For other interesting optimisations that do not depend on the cost model, see this article here.

Oracle’s BINARY_DOUBLE Can Be Much Faster Than NUMBER

Using the right data type for some calculation sounds like some obvious advice. There are many blogs about using temporal data types for temporal data, instead of strings. An obvious reason is data integrity and correctness. We don’t gain much in storing dates as 2019-09-10 in one record, and as Nov 10, 2019 in the next one. Performance is also important in this case – for example, because of statistics being more optimal (an often overlooked side effect of unfit data types). But at a customer site, I’ve recently discovered a surprising (not surprising in hindsight) performance issue when using NUMBER instead of BINARY_DOUBLE in an Oracle database.

NUMBER for monetary amounts

The NUMBER type (or DECIMAL, NUMERIC in other RDBMS) is perfectly suited for all decimal numbers, which require the correct precision and rounding. I mean, if you ever encounter this kind of display in an invoice, such as when I purchase video games on steam: … your trust level for the platform immediately goes down by factor 1.0e+1 (even if technically, it does not matter in this case). So, the default, e.g. in banking systems for monetary amounts is always to use NUMBER or an equivalent type, e.g. java.math.BigDecimal.

Calculations on numbers

But now, let’s assume we want to do some statistics on these amounts. Some aggregations of various numeric values, maybe even of their logarithms. We can establish that these two expressions are equal:
ln(a * b) = ln(a) + ln(b)
Or in other words, for positive numbers:
a * b = exp(ln(a) + ln(b))
We’ve already blogged about how this approach can be used to emulate a PRODUCT() aggregate function in SQL, which is very useful for some cases, but none of the SQL databases jOOQ supports has built-in support for this yet. Notice, the blog post also takes care of zero and negative numbers.

But which number type to choose?

Now, we might be tempted to just calculate the LN(SOME_NUMBER) value, and sum that up using SUM(LN(SOME_NUMBER)) for this use-case. This turns out to be terribly slow in Oracle. We were thinking of bad indexes, first, even doubted aggregation in general, until I suggested we try using BINARY_DOUBLE instead, in this case. In our case, we didn’t care about the exact numeric value of the amount. A IEEE 754 floating point number with double precision was going to be good enough. The results were surprising. In a simple benchmark, we compared 10 approaches to calculating this sum of logarithms:
  1. Using NUMBER(20, 10) and SUM(LN(the_number))
  2. Using NUMBER(20, 10) and SUM(LN(CAST(the_number AS BINARY_DOUBLE)))
  3. Using NUMBER(20, 10) and SUM(LN(TO_BINARY_DOUBLE(the_number)))
  4. Using NUMBER(20, 10), containing a pre-calculated LN value
  5. Using DOUBLE PRECISION and SUM(LN(the_double))
  6. Using DOUBLE PRECISION and SUM(LN(CAST(the_double AS BINARY_DOUBLE)))
  7. Using DOUBLE PRECISION and SUM(LN(TO_BINARY_DOUBLE(the_double)))
  8. Using DOUBLE PRECISION, containing a pre-calculated LN value
  9. Using BINARY_DOUBLE and SUM(LN(the_binary))
  10. Using BINARY_DOUBLE, containing a pre-calculated LN value
These were the thoughts:
  • We tried the above 3 possible numeric data types, expecting BINARY_DOUBLE to be the fastest
  • We tried to pre-calculate the LN() value for the benchmark, to see how much effort goes into summing, and how much effort goes into the LN() calculation with each type. While in general, in this system, such precalculation is impractical, we still wanted to have a benchmark comparison, in case a materialised view or other technique would be feasible.
  • We tried casting and converting each type to BINARY_DOUBLE prior to passing the value to the LN() function. Instead of migrating all the data (with possible side effects), we wanted to see if we can solve this to a reasonable extent “on the fly”
The benchmark that I’m posting here uses this table and example data (full benchmark code at the end of the article):

CREATE TABLE data (
  n1 NUMBER(20, 10),
  n2 NUMBER(20, 10),
  d1 DOUBLE PRECISION,
  d2 DOUBLE PRECISION,
  b1 BINARY_DOUBLE,
  b2 BINARY_DOUBLE
);

INSERT INTO data
SELECT level, ln(level), level, ln(level), level, ln(level)
FROM dual
CONNECT BY level <= 100000;

So, we have 100000 records, whose SUM(LN(x)) we want to calculate in the above 10 different ways. N1 contains the raw numeric value, and N2 contains the pre-calculated LN(N1).
The benchmark technique is described here. Do note it has a lot of caveats and is only useful for a limited number of verifications. Please always be careful when running such benchmarks, they often do not test production-like situations – and never use such benchmarks to compare different RDBMS products directly. They are only useful to compare two approaches on the same RDBMS product.
The results as run on Oracle 18c XE in Docker on development hardware are below. Times are compared relative to the fastest run, as actual time spent in each execution is not interesting for comparison. We did have similar results on production-like hardware, though, and I’m sure you can produce similar results in other RDBMS:
NUMBER(20, 10)
-------------------------------------------------
Run 3, Statement 1 : 280.34143  (avg : 280.75347)
Run 3, Statement 2 : 7.99402    (avg : 8.03506)
Run 3, Statement 3 : 7.71383    (avg : 7.73017)
Run 3, Statement 4 : 1.05456    (avg : 1.11735)

DOUBLE PRECISION
------------------------------------------------
Run 3, Statement 5 : 278.89476 (avg : 279.72981)
Run 3, Statement 6 : 8.06512   (avg : 8.07033)
Run 3, Statement 7 : 7.81873   (avg : 7.80063)
Run 3, Statement 8 : 1.5315    (avg : 1.54347)

BINARY_DOUBLE
------------------------------------------------
Run 3, Statement 9 : 2.4963    (avg : 2.57184)
Run 3, Statement 10: 1         (avg : 1.02943)
How to read these results?
  • Statement 10 is the fastest one, unsurprisingly, as it aggregates pre-calculated LN(binary_double) values. The precalculation of the function means that all the work has been done already before the report, and the data type is the one we expected to perform best in general
  • Statements 4 and 8 are almost as fast (precalculated LN() values). Being only slight factors off, we can attribute the difference to the usual benchmark flaws, although it’s interesting to see that DOUBLE PRECISION seems 1.5x slower to sum than BINARY_DOUBLE and even NUMBER
  • Statements 1, 5, 9 are the ones where no data type conversion is applied and SUM(LN(the_value)) is being calculated. It is staggering how much slower both NUMBER and DOUBLE PRECISION are than BINARY_DOUBLE. Statements 1 and 5 are a factor of 112x slower than statement 9!
  • Statements 2-3, 6-7 prove that converting the NUMBER or DOUBLE PRECISION vales to BINARY_DOUBLE on the fly provides a sufficiently performant workaround, which made statements 2-3, 6-7 only 3x slower than statement 9
  • Statements 2-3, 6-7 show that casting and converting are about equivalent
Note, we found similar results with other functions, such as EXP()

Analysis

The order of magnitude may seem surprising at first, but thinking about it, it is not. We would never do CPU intensive computation with java.math.BigDecimal in Java. The BigDecimal type is there for numeric accuracy, e.g. when it really matters what the monetary amount is, exactly. When we run analytics on monetary amounts, using double is sufficient in Java as well. If our data is BigDecimal, and we cannot reasonably change that, it might still be better to use the BigDecimal::doubleValue conversion prior to further processing using e.g. Math::log. So, this translates directly to SQL, whose LN() implementations are data type specific. IEEE 754 having been designed precisely for this purpose. When doing CPU intensive computations both in Java, or in the database, we should always evaluate our various options of
  • Quick fixing our data sets for the report (ad-hoc conversion prior to calculation)
  • Thoroughly fixing our data sets in the schema (migration towards a better data type)
  • Preprocessing our data sets (precalculating some very commonly used computations)

Benchmark logic

Just run the below on your own hardware. I’m curious to see your results:

-- Copyright Data Geekery GmbH
--
-- Licensed under the Apache License, Version 2.0 (the "License");
-- you may not use this file except in compliance with the License.
-- You may obtain a copy of the License at
--
--     http://www.apache.org/licenses/LICENSE-2.0
--
-- Unless required by applicable law or agreed to in writing, software
-- distributed under the License is distributed on an "AS IS" BASIS,
-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-- See the License for the specific language governing permissions and
-- limitations under the License.
--
-- This version displays relative execution times (fastest execution = 1)
SET SERVEROUTPUT ON

ALTER SYSTEM FLUSH SHARED_POOL;
ALTER SYSTEM FLUSH BUFFER_CACHE;

CREATE TABLE data (
  n1 NUMBER(20, 10),
  n2 NUMBER(20, 10),
  d1 DOUBLE PRECISION,
  d2 DOUBLE PRECISION,
  b1 BINARY_DOUBLE,
  b2 BINARY_DOUBLE
);

INSERT INTO data
SELECT level, ln(level), level, ln(level), level, ln(level)
FROM dual
CONNECT BY level <= 100000;

CREATE TABLE results (
  run     NUMBER(2),
  stmt    NUMBER(2),
  elapsed NUMBER
);

DECLARE
  v_ts TIMESTAMP WITH TIME ZONE;
  v_repeat CONSTANT NUMBER := 10;
  v_stmt NUMBER;
BEGIN

  -- Repeat the whole benchmark several times to avoid warmup penalty
  FOR r IN 1..5 LOOP
    v_ts := SYSTIMESTAMP;
    v_stmt := 1;
      
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT sum(ln(n1)) FROM data
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;
  
    INSERT INTO results VALUES (r, v_stmt, SYSDATE + ((SYSTIMESTAMP - v_ts) * 86400) - SYSDATE);
    v_ts := SYSTIMESTAMP;
    v_stmt := v_stmt + 1;
      
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT sum(ln(cast(n1 as binary_double))) FROM data
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;
  
    INSERT INTO results VALUES (r, v_stmt, SYSDATE + ((SYSTIMESTAMP - v_ts) * 86400) - SYSDATE);
    v_ts := SYSTIMESTAMP;
    v_stmt := v_stmt + 1;
      
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT sum(ln(to_binary_double(n1))) FROM data
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;
  
    INSERT INTO results VALUES (r, v_stmt, SYSDATE + ((SYSTIMESTAMP - v_ts) * 86400) - SYSDATE);
    v_ts := SYSTIMESTAMP;
    v_stmt := v_stmt + 1;
      
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT sum(n2) FROM data
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;
    
    INSERT INTO results VALUES (r, v_stmt, SYSDATE + ((SYSTIMESTAMP - v_ts) * 86400) - SYSDATE);
    v_ts := SYSTIMESTAMP;
    v_stmt := v_stmt + 1;
      
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT sum(ln(d1)) FROM data
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;
    
    INSERT INTO results VALUES (r, v_stmt, SYSDATE + ((SYSTIMESTAMP - v_ts) * 86400) - SYSDATE);
    v_ts := SYSTIMESTAMP;
    v_stmt := v_stmt + 1;
      
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT sum(ln(cast(d1 as binary_double))) FROM data
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;
    
    INSERT INTO results VALUES (r, v_stmt, SYSDATE + ((SYSTIMESTAMP - v_ts) * 86400) - SYSDATE);
    v_ts := SYSTIMESTAMP;
    v_stmt := v_stmt + 1;
      
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT sum(ln(to_binary_double(d1))) FROM data
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;
  
    INSERT INTO results VALUES (r, v_stmt, SYSDATE + ((SYSTIMESTAMP - v_ts) * 86400) - SYSDATE);
    v_ts := SYSTIMESTAMP;
    v_stmt := v_stmt + 1;
      
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT sum(d2) FROM data
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;
      
    INSERT INTO results VALUES (r, v_stmt, SYSDATE + ((SYSTIMESTAMP - v_ts) * 86400) - SYSDATE);
    v_ts := SYSTIMESTAMP;
    v_stmt := v_stmt + 1;
      
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT sum(ln(b1)) FROM data
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;
  
    INSERT INTO results VALUES (r, v_stmt, SYSDATE + ((SYSTIMESTAMP - v_ts) * 86400) - SYSDATE);
    v_ts := SYSTIMESTAMP;
    v_stmt := v_stmt + 1;
      
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT sum(b2) FROM data
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;
      
    INSERT INTO results VALUES (r, v_stmt, SYSDATE + ((SYSTIMESTAMP - v_ts) * 86400) - SYSDATE);
  END LOOP;
  
  FOR rec IN (
    SELECT 
      run, stmt, 
      CAST(elapsed / MIN(elapsed) OVER() AS NUMBER(10, 5)) ratio,
      CAST(AVG(elapsed) OVER (PARTITION BY stmt) / MIN(elapsed) OVER() AS NUMBER(10, 5)) avg_ratio
    FROM results
    ORDER BY run, stmt
  )
  LOOP
    dbms_output.put_line('Run ' || rec.run || 
      ', Statement ' || rec.stmt || 
      ' : ' || rec.ratio || ' (avg : ' || rec.avg_ratio || ')');
  END LOOP;
  
  dbms_output.put_line('');
  dbms_output.put_line('Copyright Data Geekery GmbH');
  dbms_output.put_line('https://www.jooq.org/benchmark');
END;
/

DROP TABLE data;
DROP TABLE results;

How to Fetch All Current Identity Values in Oracle

Oracle 12c has introduced the useful SQL standard IDENTITY feature, which is essentially just syntax sugar for binding a sequence to a column default. We can use it like this:

create table t1 (col1 number generated always as identity);
create table t2 (col2 number generated always as identity);

insert into t1 values (default);
insert into t1 values (default);
insert into t1 values (default);
insert into t2 values (default);

select * from t1;
select * from t2;

Which produces
COL1
----
  1
  2
  3

COL2
----
  1
For unit testing against our database, we might want to know what “state” our identities are in. For each table, we would like to know the next value such an identity would produce. If we knew all the backing sequence names, we could query their seq.currval, but we don’t know those sequence names as they are generated. However, we can query the dictionary views to get this information as follows:

select data_default
from user_tab_cols
where data_default is not null
and identity_column = 'YES'
and table_name in ('T1', 'T2');

An alternative is to query user_tab_identity_cols This would produce:
"TEST"."ISEQ$$_116601".nextval
"TEST"."ISEQ$$_116603".nextval
Now, if we’re lazy, we could just run EXECUTE IMMEDIATE on each of those expressions and we’re done:

set serveroutput on
declare
  v_current number;
begin
  for rec in (
    select table_name, data_default
    from user_tab_cols
    where data_default is not null
    and identity_column = 'YES'
    and table_name in ('T1', 'T2')
  ) loop
    execute immediate replace(
      'select ' || rec.data_default || ' from dual', 
      '.nextval', 
      '.currval'
    ) into v_current;
    dbms_output.put_line(
      'Table : ' || rec.table_name || 
      ', currval : ' || v_current
    );
  end loop;
end;
/

This would produce:
Table : T1, currval : 3
Table : T2, currval : 1
Alternatively, if you want this result to be a SQL result instead of DBMS_OUTPUT content, you could run this:

with
  function current_value(p_table_name varchar2) return number is
    v_current number;
  begin
    for rec in (
      select data_default
      from user_tab_cols
      where table_name = p_table_name
      and data_default is not null
      and identity_column = 'YES'
    )
    loop
      execute immediate replace(
        'select ' || rec.data_default || ' from dual', 
        '.nextval', 
        '.currval'
      ) into v_current;
      return v_current;
    end loop;
    
    return null;
  end;
select *
from (
  select table_name, current_value(table_name) current_value
  from user_tables
  where table_name in ('T1', 'T2')
)
where current_value is not null
order by table_name;
/

The alternative using user_tab_identity_cols would look like this:

with
  function current_value(p_table_name varchar2) return number is
    v_current number;
  begin
    for rec in (
      select sequence_name
      from user_tab_identity_cols
      where table_name = p_table_name
    )
    loop
      execute immediate 
        'select ' || rec.sequence_name || '.currval from dual'
      into v_current;
      return v_current;
    end loop;
     
    return null;
  end;
select *
from (
  select table_name, current_value(table_name) current_value
  from user_tables
)
where current_value is not null
order by table_name;
/

The result is now a nice SQL result set:
TABLE_NAME   CURRENT_VALUE
--------------------------
T1           3
T2           1

Using IGNORE NULLS With SQL Window Functions to Fill Gaps

I found a very interesting SQL question on Twitter recently:
Rephrasing the question: We have a set of sparse data points:
+------------+-------+
| VALUE_DATE | VALUE |
+------------+-------+
| 2019-01-01 |   100 |
| 2019-01-02 |   120 |
| 2019-01-05 |   125 |
| 2019-01-06 |   128 |
| 2019-01-10 |   130 |
+------------+-------+
Since dates can be listed as discrete, continuous data points, why not fill in the gaps between 2019-01-02 and 2019-01-05 or 2019-01-06 and 2019-01-10? The desired output would be:
+------------+-------+
| VALUE_DATE | VALUE |
+------------+-------+
| 2019-01-01 |   100 |
| 2019-01-02 |   120 | <-+
| 2019-01-03 |   120 |   | -- Generated
| 2019-01-04 |   120 |   | -- Generated
| 2019-01-05 |   125 |
| 2019-01-06 |   128 | <-+
| 2019-01-07 |   128 |   | -- Generated
| 2019-01-08 |   128 |   | -- Generated
| 2019-01-09 |   128 |   | -- Generated
| 2019-01-10 |   130 |
+------------+-------+
In the generated columns, we’ll just repeat the most recent value.

How to do this with SQL?

For the sake of this example, I’m using Oracle SQL, as the OP was expecting to do this with Oracle. The idea is to do this in two steps:
  1. Generate all the dates between the first and the last data points
  2. For each date, find either the current data point, or the most recent one
But first, let’s create the data:

create table t (value_date, value) as
  select date '2019-01-01', 100 from dual union all
  select date '2019-01-02', 120 from dual union all
  select date '2019-01-05', 125 from dual union all
  select date '2019-01-06', 128 from dual union all
  select date '2019-01-10', 130 from dual;

1. Generating all the dates In Oracle, we can use the convenient CONNECT BY syntax for this. We could also use some other tool to generate dates to fill the gaps, including SQL standard recursion using WITH, or some PIPELINED function, but I like CONNECT BY for this purpose. We’ll write:

select (
  select min(t.value_date) 
  from t
) + level - 1 as value_date
from dual
connect by level <= (
  select max(t.value_date) - min(t.value_date) + 1
  from t
)

This produces:
VALUE_DATE|
----------|
2019-01-01|
2019-01-02|
2019-01-03|
2019-01-04|
2019-01-05|
2019-01-06|
2019-01-07|
2019-01-08|
2019-01-09|
2019-01-10|
Now we wrap the above query in a derived table and left join the actual data set:

select 
  d.value_date,
  t.value
from (
  select (
    select min(t.value_date) 
    from t
  ) + level - 1 as value_date
  from dual
  connect by level <= (
    select max(t.value_date) - min(t.value_date) + 1
    from t
  )
) d
left join t
on d.value_date = t.value_date
order by d.value_date;

The date gaps are now filled, but our values column is still sparse:
VALUE_DATE|VALUE|
----------|-----|
2019-01-01|  100|
2019-01-02|  120|
2019-01-03|     |
2019-01-04|     |
2019-01-05|  125|
2019-01-06|  128|
2019-01-07|     |
2019-01-08|     |
2019-01-09|     |
2019-01-10|  130|
2. Fill the value gaps On each row, the VALUE column should either contain the actual value, or the “last_value” preceding the current row, ignoring all the nulls. Note that I specifically wrote this requirement using specific English language. We can now translate that sentence directly to SQL:

last_value (t.value) ignore nulls over (order by d.value_date)

Since we have added an ORDER BY clause to the window function, the default frame RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW applies, which colloquially means “all the preceding rows”. (Technically, that’s not accurate. It means all rows with values less than or equal to the value of the current row – see Kim Berg Hansen’s comment) Convenient! We’re trying to find the last value in the window of all the preceding rows, ignoring the nulls. This is standard SQL, but unfortunately not all RDBMS support IGNORE NULLS. Among the ones supported by jOOQ, currently these ones support the syntax:
  • DB2
  • H2
  • Informix
  • Oracle
  • Redshift
  • Sybase SQL Anywhere
  • Teradata
Sometimes, not the exact standard syntax is supported, but the standard feature. Use https://www.jooq.org/translate to see different syntax variants. The full query now reads:

select 
  d.value_date,
  last_value (t.value) ignore nulls over (order by d.value_date)
from (
  select (
    select min(t.value_date) 
    from t
  ) + level - 1 as value_date
  from dual
  connect by level <= (
    select max(t.value_date) - min(t.value_date) + 1
    from t
  )
) d
left join t
on d.value_date = t.value_date
order by d.value_date;

… and it yields the desired result:
VALUE_DATE         |VALUE|
-------------------|-----|
2019-01-01 00:00:00|  100|
2019-01-02 00:00:00|  120|
2019-01-03 00:00:00|  120|
2019-01-04 00:00:00|  120|
2019-01-05 00:00:00|  125|
2019-01-06 00:00:00|  128|
2019-01-07 00:00:00|  128|
2019-01-08 00:00:00|  128|
2019-01-09 00:00:00|  128|
2019-01-10 00:00:00|  130|

Other RDBMS

This solution made use of some Oracle specific features such as CONNECT BY. In other RDBMS, the same idea can be implemented by using a different way of generating data. This article focuses only on using IGNORE NULLS. If you’re interested, feel free to post an alternative solution in the comments for your RDBMS.