Java’s Checked Exceptions Are Just Weird Union Types

This fun fact has been on my mind for a while, and a recent reddit thread about “Smuggling Checked Exceptions with Sealed Interfaces” made me write this post here. Namely, Java had union types before it was cool! (If you squint hard).

What are union types?

Ceylon is an underrated JVM language that never really took off, which is too bad, because the concepts it introduced are very elegant (see e.g. how they implemented nullable types as syntax sugar on top of union types, which IMO is much better than anything monadic using Option types or kotlin’s ad-hoc type system extension).

So, one of those concepts are union types. One of the most popular language that supports them, currently, is TypeScript, though C++, PHP, and Python also have something similar. (The fact whether the union type is tagged or not isn’t relevant to this post).

If you understand Java’s intersection types A & B (meaning that something is both a subtype of A and of B), then it’s easy to understand union types A | B (meaning that something is a subtype of any of A or B). TypeScript shows a simple example of this

function printId(id: number | string) {
  console.log("Your ID is: " + id);
}
// OK
printId(101);
// OK
printId("202");
// Error
printId({ myID: 22342 });

Structural vs nominal typing

Such a union type (or intersection type) is a structural type, as opposed to what we’ve been doing in Java via nominal types, where you have to declare a named type for this union every time you want to use it. E.g. in jOOQ, we have things like:

interface FieldOrRow {}
interface Field<T> extends FieldOrRow {}
interface Row extends FieldOrRow {}

Very soon, the Java 17 distribution will seal the above type hierarchy as follows (incomplete, subtypes of Field<T> and Row are omitted for brevity):

sealed interface FieldOrRow permits Field<T>, Row {}
sealed interface Field<T> extends FieldOrRow permits ... {}
sealed interface Row extends FieldOrRow permits ... {}

There are pros and cons for doing these things structurally or nominally:

Structural typing:

  • Pro: You can create any ad-hoc union of any set of types anywhere
  • Pro: You don’t have to change the existing type hierarchy, which is essential when you don’t have access to it, e.g. when you want to do something like the above number | string type. This kinda works like JSR 308 type annotations that were introduced in Java 8.

Nominal typing:

  • Pro: You can attach documentation to the type, and reuse it formally (rather than structurally). TypeScript and many other languages offer type aliases for this kind of stuff, so you can have a bit of both worlds, though the alias is erased, meaning you keep the complex structural type leading to curious error messages.
  • Pro: You can seal the type hierarchy to allow for exhaustiveness checking among subtypes (e.g. above, there can only be Field<T> or Row subtypes of FieldOrRow. A structurally typed union type is implicitly “sealed” ad-hoc by the union type description (not sure if that’s how it’s called), but with nominal types, you can make sure no one else can extend the type hierarchy, (except where you permit it explicitly using the non-sealed keyword)

Ultimately, as ever so often, things like structural and nominal typing are two sides of the same coin, pros and cons mostly depending on taste and on how much you control a code base.

So, how are checked exceptions union types?

When you declare a method that throws checked exceptions, the return type of the method is really such a union type. Look at this example in Java:

public String getTitle(int id) throws SQLException;

The call-site now has to “check” the result of this method call using try-catch, or declare re-throwing the checked exception(s):

try {
    String title = getTitle(1);
    doSomethingWith(title);
}
catch (SQLException e) {
    handle(e);
}

If early Java had union types rather than checked exceptions, we might have declared this as follows, instead:

public String|SQLException getTitle(int id);

Likewise, a caller of this method will have to “check” the result of this method call. There’s no simple way of re-throwing it, so if we do want to re-throw, we’d need some syntax sugar, or repeat the same code all the time, Go-style:

// Hypothetical Java syntax:
String|SQLException result = getTitle(1);

switch (result) {
    case String title -> doSomethingWith(title);
    case SQLException e -> handle(e);
}

It would be obvious how such a JEP 406 style switch pattern matching statement or expression could implement an exhaustiveness check, just like with the existing JEP 409 sealed classes approach, the only difference, again, being that everything is now structurally typed, rather than nominally typed.

In fact, if you declare multiple checked exceptions, such as the JDK’s reflection API:

public Object invoke(Object obj, Object... args)
throws 
    IllegalAccessException, 
    IllegalArgumentException,
    InvocationTargetException

With union types, this would just be this, instead:

// Hypothetical Java syntax:
public Object
    | IllegalAccessException
    | IllegalArgumentException
    | InvocationTargetException invoke(Object obj, Object... args)

And the union type syntax from the catch block, which checks for exhaustiveness (yes, we have union types in catch!)…

try {
    Object returnValue = method.invoke(obj);
    doSomethingWith(returnValue);
}
catch (IllegalAccessException | IllegalArgumentException e) {
    handle1(e);
}
catch (InvocationTargetException e) {
    handle2(e);
}

Could still check for exhaustiveness with the switch pattern matching approach:

// Hypothetical Java syntax:
Object
    | IllegalAccessException
    | IllegalArgumentException
    | InvocationTargetException result = method.invoke(obj);

switch (result) {
    case IllegalAccessException, 
         IllegalArgumentException e -> handle1(e);
    case InvocationTargetException e -> handle2(e);
    case Object returnValue = doSomethingWith(returnValue);
}

A subtle caveat here is that exceptions are subtypes of Object, so we must put that case at the end, as it “dominates” the others (see JEP 406 for a discussion about dominance). Again, we can prove exhaustiveness, because all types that are involved in the union type have a switch case.

Can we emulate union types with checked exceptions?

You know what Jeff Goldblum would say

But this blog is known to do it anyway. Assuming that for every possible type, we had a synthetic (code generated?) checked exception that wraps it (because in Java, exceptions are not allowed to be generic):

// Use some "protective" base class, so no one can introduce 
// RuntimeExceptions to the type hierarchy
class E extends Exception {

    // Just in case you're doing this in performance sensitive code...
    @Override
    public Throwable fillInStackTrace() {
        return this;
    }
}

// Now create a wrapper exception for every type you want to represent
class EString extends E {
    String s;
    EString(String s) {
        this.s = s;
    }
}
class Eint extends E {
    int i;
    Eint(int i) {
        this.i = i;
    }
}

The benefit of this is we don’t have to wait for Valhalla to support primitive types in generics, nor to reify them. We’ve already emulated that as you can see above.

Next, we need a switch emulation for arbitrary degrees (22 will probably be enough?). Here’s one for degree 2:

// Create an arbitrary number of switch utilities for each arity up 
// to, say 22 as is best practice
class Switch2<E1 extends E, E2 extends E> {
    E1 e1;
    E2 e2;

    private Switch2(E1 e1, E2 e2) {
        this.e1 = e1;
        this.e2 = e2;
    }

    static <E1 extends E, E2 extends E> Switch2<E1, E2> of1(E1 e1) {
        return new Switch2<>(e1, null);
    }

    static <E1 extends E, E2 extends E> Switch2<E1, E2> of2(E2 e2) {
        return new Switch2<>(null, e2);
    }

    void check() throws E1, E2 {
        if (e1 != null)
            throw e1;
        else
            throw e2;
    }
}

And finally, here’s how we can emulate our exhaustiveness checking switch with catch blocks!

// "Union type" emulating String|int
Switch2<EString, Eint> s = Switch2.of1(new EString("hello"));

// Doesn't compile, Eint isn't caught (catches aren't exhaustive)
try {
    s.check();
}
catch (EString e) {}

// Compiles fine
try {
    s.check();
}
catch (EString e) {}
catch (Eint e) {}

// Also compiles fine
try {
    s.check();
}
catch (EString | Eint e) {}

// Doesn't compile because Eint "partially dominates" EString | Eint
try {
    s.check();
}
catch (Eint e) {}
catch (EString | Eint e) {}

“Neat”, huh? We could even imagine destructuring within the catch block, such that we can automatically unwrap the value from the auxiliary “E” type.

Since we already have “union types” in Java (in catch blocks), and since checked exception declarations could be retrofitted to form a union type with the method’s actual return type, my hopes are still that in some distant future, a more powerful Java will be available where these “union types” (and also intersection types) will be made first class. APIs like jOOQ would greatly profit from this!

Write C-Style Local Static Variables in Java 16

Java 16 includes an improvement that makes the language a bit more regular via JEP 395. The JEP says:

Static members of inner classes

It is currently specified to be a compile-time error if an inner class declares a member that is explicitly or implicitly static, unless the member is a constant variable. This means that, for example, an inner class cannot declare a record class member, since nested record classes are implicitly static.

We relax this restriction in order to allow an inner class to declare members that are either explicitly or implicitly static. In particular, this allows an inner class to declare a static member that is a record class.

What sounds like a minor necessary evil to make a new feature (record classes) more versatile actually has a life of its own. We can use it to emulate C-style local static variables, i.e. local variables that are:

  • Initialised only once (and lazily at that)
  • Shared among multiple executions of a method

This sounds like a rather hairy feature, global variables that are visible only locally. But in fact, it’s something I’ve wanted for a long time, especially when I wanted to cache regular expression patterns without polluting the class namespace.

Consider this code:

package p;

import java.util.regex.Pattern;

public class Test {
    public static void main(String[] args) {
        check("a");
        check("b");
    }
    
    static Pattern compile(String pattern) {
        System.out.println("compile(" + pattern + ")");
        return Pattern.compile(pattern);
    }
    
    static void check(String string) {
        // Re-compiling the pattern every time: Bad
        // Keeping the pattern local to the method: Good
        System.out.println("check(" + string + "): " 
            + compile("a").matcher(string).find());
    }
}

It prints:

compile(a)
check(a): true
compile(a)
check(b): false

Compiling a pattern can be costly if done frequently, so we better cache it. We used to do that like this:

package p;

import java.util.regex.Pattern;

public class Test {
    public static void main(String[] args) {
        check("a");
        check("b");
    }
    
    static Pattern compile(String pattern) {
        System.out.println("compile(" + pattern + ")");
        return Pattern.compile(pattern);
    }
    
    // Compiling the pattern only once: Good
    // Placing the pattern in a class namespace: Bad
	static final Pattern P_CHECK = compile("a");
	
    static void check(String string) {
        System.out.println("check(" + string + "): " 
            + P_CHECK.matcher(string).find());
    }
}

This now prints a more optimal output:

compile(a)
check(a): true
check(b): false

I.e. the regular expression pattern is compiled only once. But unfortunately, we had to pollute the entire class’s namespace, which can quickly become cumbersome if we have dozens of such regular expressions. Can we scope the P_CHECK variable to the check() method only? We now can!

package p;

import java.util.regex.Pattern;

public class Test {
    public static void main(String[] args) {
        check("a");
        check("b");
    }
    
    static Pattern compile(String pattern) {
        System.out.println("compile(" + pattern + ")");
        return Pattern.compile(pattern);
    }
    
    static void check(String string) {

        // Compiling the pattern only once: Good
        // Keeping the pattern local to the method: Good
        // Capturing scope: Egh...
        var patterns = new Object() { 
            static final Pattern P_CHECK = compile("a");
        };
        
        System.out.println("check(" + string + "): " 
            + patterns.P_CHECK.matcher(string).find());
    }
}

This again prints the desired, optimal output:

compile(a)
check(a): true
check(b): false

The combination of using var to use a non-denotable type (whose members we can dereference) along with the ability of putting static members in inner classes effectively emulates local static variables, just like in C!

Since the inner class is unlikely to escape its scope, the fact that it may be capturing scope isn’t that big of a risk as illustrated previously in a criticism of the double curly brace anti pattern. You’re still creating an extra class and a use-less object that escape analysis hopefully prevents from allocating, so it’s not exactly a very clean solution, but nice to know this is now possible.

Using jOOQ to write vendor agnostic SQL with JPA’s native query or @Formula

If your legacy JPA application is using occasional native queries or Hibernate @Formula or Spring Data @Query annotation with vendor specific native SQL embedded in it, you can use jOOQ’s parsing connection and parsing data source to translate between dialects, without having to go all in on your jOOQ adoption – though I think it’s inevitable once you see what jOOQ can do for you.

Now, let’s design a table like this:

CREATE TABLE author (
  id INT NOT NULL,
  first_name TEXT,
  last_name TEXT NOT NULL,

  CONSTRAINT pk_author PRIMARY KEY (id)
);

Now, you may want to write a native query on this table using JPA’s EntityManager.createNativeQuery(), mapping that to entities. You could be using jOOQ’s DSL API for this, but let’s say you’re not ready for a migration to jOOQ yet, or you want to use actual SQL as provided by your DBA, not jOOQ’s DSL.

So, in MariaDB, you might be writing something like this:

List<Author> result =
em.createNativeQuery("""
    select a.id, nvl(a.first_name, 'N/A') as first_name, a.last_name
    from t_author as a
    order by a.id
    """, Author.class)
  .getResultList();

Where your entity is defined like this:

@Entity
@Table(name = "author")
public class Author {
    @Id
    public int id;

    @Column(name = "first_name")
    public String firstName;

    @Column(name = "last_name")
    public String lastName;

    // Constructors, getters, setters, equals, hashCode, etc
}

The above runs just fine and produces all the authors in MariaDB, which implemented support for Oracle’s NVL() function. But what about Oracle itself? The query fails on Oracle with:

ORA-00933: SQL command not properly ended

This is because in Oracle, you can’t use the AS keyword to alias tables, only to alias columns. Sure, you can remove that, but what about NVL()? You want this to work on MySQL and SQL Server, as well, but they complain:

MySQL

SQL Error [1305] [42000]: FUNCTION test.nvl does not exist

SQL Server

SQL Error [195] [S0010]: ‘nvl’ is not a recognized built-in function name.

Now, you have these options:

  • Use jOOQ to generate the SQL string for you, using the DSL
  • Use JPQL instead of a native query (but then heavily rewrite it, because JPQL is much less powerful than SQL)
  • Try your luck writing actual vendor agnostic SQL, manually
  • Or…

jOOQ’s parsing connection

You could use jOOQ’s parsing connection, which acts as a proxy to your actual connection, intercepting every SQL statement on the JDBC level in order to translate it to the target dialect.

This is as simple as wrapping your existing JDBC Connection or DataSource as follows:

DataSource originalDataSource = ...
DataSource parsingDataSource = DSL
    .using(originalDataSource, dialect)
    .parsingDataSource();

That’s it! I mean, you could pass some additional configuration Settings after the dialect, but that’s about as easy as it gets. The new DataSource can now run your SQL query on all the aforementioned dialects, e.g. you might see this in your DEBUG logs:

On MySQL:

-- org.hibernate.SQL
   select a.id, nvl(a.first_name, 'N/A') as first_name, a.last_name 
   from t_author as a 
   order by a.id
-- org.jooq.impl.ParsingConnection Translating from:
   select a.id, nvl(a.first_name, 'N/A') as first_name, a.last_name
   from t_author as a 
   order by a.id
-- org.jooq.impl.ParsingConnection Translation cache miss: 
   select a.id, nvl(a.first_name, 'N/A') as first_name, a.last_name
   from t_author as a 
   order by a.id
-- org.jooq.impl.ParsingConnection Translating to: 
   select a.id, ifnull(a.first_name, 'N/A') as first_name, a.last_name
   from t_author as a 
   order by a.id

On SQL Server:

-- org.hibernate.SQL
   select a.id, nvl(a.first_name, 'N/A') as first_name, a.last_name 
   from t_author as a 
   order by a.id
-- org.jooq.impl.ParsingConnection Translating from:
   select a.id, nvl(a.first_name, 'N/A') as first_name, a.last_name
   from t_author as a 
   order by a.id
-- org.jooq.impl.ParsingConnection Translation cache miss: 
   select a.id, nvl(a.first_name, 'N/A') as first_name, a.last_name
   from t_author as a 
   order by a.id
-- org.jooq.impl.ParsingConnection] Translating to: 
   select a.id, coalesce(a.first_name, 'N/A') first_name, a.last_name 
   from author a 
   order by a.id

Hibernate was tricked by jOOQ! The NVL function was translated to MySQL’s IFNULL or SQL Server COALESCE, and the AS keyword was removed from the SQL Server query. These are just simple examples, your actual SQL may be much more complex. Play around with the feature set online, here.

Also, the Settings.cacheParsingConnectionLRUCacheSize flag, which defaults to 8192, makes sure the same query doesn’t get re-translated all the time so you won’t spend too much time in jOOQ’s parser.

@Formula too, not just native queries

A quick win in Hibernate when you want to project additional values, akin to SQL’s own computed columns, which are available in many SQL dialects, is the @Formula annotation, which can be added to any entity, like so. Assuming this additional column:

ALTER TABLE author ADD year_of_birth INT;

We might have the following amended entity:

@Entity
@Table(name = "author")
public class Author {
    @Id
    public int id;

    @Column(name = "first_name")
    public String firstName;

    @Column(name = "last_name")
    public String lastName;

    @Column(name = "year_of_birth")
    public Integer yearOfBirth;

    @Formula("year_of_birth between 1981 and 1996")
    public Boolean millenial;

    // Constructors, getters, setters, equals, hashCode, etc
}

But unfortunately, there are still so many RDBMS that don’t actually support boolean types, and the @Formula annotation is purely static, and doesn’t allow for vendor-specific overrides. Are we going to rewrite that SQL query manually to make sure a SQL-92, vendor agnostic native SQL fragment is present that works on all dialects?

Or will we just plug in jOOQ’s parsing connection again? Let’s try the latter with:

Author author = em.find(Author.class, 1);

The MySQL log contains:

-- org.hibernate.SQL
   select 
     jpaauthorw0_.id as id1_4_0_,
     jpaauthorw0_.first_name as first_na2_4_0_, 
     jpaauthorw0_.last_name as last_nam3_4_0_, 
     jpaauthorw0_.year_of_birth between 1981 and 1996 as formula1_0_ 
   from author jpaauthorw0_ 
   where jpaauthorw0_.id=?
-- org.jooq.impl.ParsingConnection Translating from: [...]
-- org.jooq.impl.ParsingConnection Translation cache miss: [...]
-- org.jooq.impl.ParsingConnection Translating to: 
   select 
     jpaauthorw0_.id as id1_4_0_, 
     jpaauthorw0_.first_name as first_na2_4_0_, 
     jpaauthorw0_.last_name as last_nam3_4_0_, 
     jpaauthorw0_.year_of_birth between 1981 and 1996 as formula1_0_ 
   from author as jpaauthorw0_ 
   where jpaauthorw0_.id = ?

As you can see, jOOQ re-added the AS keyword to alias the on MySQL for good measure, because we like to be explicit about aliasing, and because that’s the default for Settings.renderOptionalAsKeywordForTableAliases

Whereas the SQL Server log contains:

-- org.hibernate.SQL 
   select 
     jpaauthorw0_.id as id1_4_0_, 
     jpaauthorw0_.first_name as first_na2_4_0_, 
     jpaauthorw0_.last_name as last_nam3_4_0_, 
     jpaauthorw0_.year_of_birth between 1981 and 1996 as formula1_0_ 
   from author jpaauthorw0_ 
   where jpaauthorw0_.id=?
-- org.jooq.impl.ParsingConnection Translating from: [...]
-- org.jooq.impl.ParsingConnection Translation cache miss: [...]
-- org.jooq.impl.ParsingConnection Translating to: 
   select
     jpaauthorw0_.id id1_4_0_, 
     jpaauthorw0_.first_name first_na2_4_0_, 
     jpaauthorw0_.last_name last_nam3_4_0_, 
     case 
       when jpaauthorw0_.year_of_birth between 1981 and 1996 
         then 1 
       when not (jpaauthorw0_.year_of_birth between 1981 and 1996) 
         then 0 
     end formula1_0_ 
   from author jpaauthorw0_ 
   where jpaauthorw0_.id = ?

A NULL-safe BOOLEAN type emulation (because if YEAR_OF_BIRTH is NULL (i.e. UNKNOWN), then MILLENIAL must be NULL, too (i.e. UNKNOWN))

Spring Data @Query annotation

Another case where native SQL can appear in JPA integrations is Spring Data JPA with its @Query annotation, especially when used with @Query(nativeQuery = true). Just like Hibernate’s @Formula, this annotation is compile-time static, with no way to override the value of the native query at runtime, short of maybe subtyping repositories per dialect.

But why go through all that hassle. It’s always the same thing. Just patch the DataSource with jOOQ’s parsing connection or parsing data source, and you’re set.

Conclusion

Even if you’re not using jOOQ’s DSL API, you can profit from jOOQ in many ways in your existing JDBC, R2DBC, JPA, MyBatis, etc. based applications by hooking in the jOOQ parsing connection and translating your vendor specific input dialect to any number of configurable output dialects.

If jOOQ’s parser can’t handle a feature, chances are, you can work around this limitation using the ParseListener SPI, e.g. when you want to support a hypothetical LOGICAL_XOR predicate (which is supported natively in MySQL):

Query query = configuration
    .derive(ParseListener.onParseCondition(ctx -> {
        if (ctx.parseFunctionNameIf("LOGICAL_XOR")) {
            ctx.parse('(');
            Condition c1 = ctx.parseCondition();
            ctx.parse(',');
            Condition c2 = ctx.parseCondition();
            ctx.parse(')');

            return CustomCondition.of(c -> {
                switch (c.family()) {
                    case MARIADB:
                    case MYSQL:
                        c.visit(condition("{0} xor {1}", c1, c2));
                        break;
                    default:
                        c.visit(c1.andNot(c2).or(c2.andNot(c1)));
                        break;
            });
        }

        // Let the parser take over if we don't know the token
        return null;
    })
    .dsl()
    .parser()
    .parseQuery(
        "select * from t where logical_xor(t.a = 1, t.b = 2)"
    );
  
System.out.println(DSL.using(SQLDialect.MYSQL).render(query));
System.out.println(DSL.using(SQLDialect.ORACLE).render(query));

The above program will print:

-- MYSQL:
select * 
from t
where (t.a = 1 xor t.b = 2);

-- ORACLE:
select * 
from t 
where (t.a = 1 and not (t.b = 2)) or (t.b = 2 and not (t.a = 1));

So, profit from using jOOQ to migrate your application’s vendor specific SQL off one RDBMS onto another, or to support multiple RDBMS products in a single application, with or without using jOOQ’s DSL!

Side note: Query transformation

This isn’t the topic of this blog post, but once you have jOOQ parse your every SQL statement, you can also use jOOQ to transform this SQL and tamper with the expression tree, for example by implementing client side row level security. The possibilities are endless!

Formatting ASCII Charts With jOOQ

A very little known feature in jOOQ is the Formattable.formatChart() capability, which allows for formatting any jOOQ result as an ASCII chart. This can be useful for quick plotting of results in your console application.

Assuming you have a result set of this form (which is what you’re getting when you call result.format() or just result.toString())

+---+---+---+---+
|c  | v1| v2| v3|
+---+---+---+---+
|a  |  1|  2|  3|
|b  |  2|  1|  1|
|c  |  4|  0|  1|
+---+---+---+---+

This result can be produced by any query, or you can construct it without executing a query like this:

Field<String>  c  = field("c", VARCHAR);
Field<Integer> v1 = field("v1", INTEGER);
Field<Integer> v2 = field("v2", INTEGER);
Field<Integer> v3 = field("v3", INTEGER);

Result<Record4<String, Integer, Integer, Integer>> result = 
    create.newResult(c, v1, v2, v3); 
                      
result.add(create.newRecord(c, v1, v2, v3).values("a", 1, 2, 3));
result.add(create.newRecord(c, v1, v2, v3).values("b", 2, 1, 1));
result.add(create.newRecord(c, v1, v2, v3).values("c", 4, 0, 1));

Then, calling result.formatChart() will produce a nice ASCII chart like this, by default:

4.00|                                                  █████████████████████████
3.86|                                                  █████████████████████████
3.73|                                                  █████████████████████████
3.59|                                                  █████████████████████████
3.45|                                                  █████████████████████████
3.32|                                                  █████████████████████████
3.18|                                                  █████████████████████████
3.05|                                                  █████████████████████████
2.91|                                                  █████████████████████████
2.77|                                                  █████████████████████████
2.64|                                                  █████████████████████████
2.50|                                                  █████████████████████████
2.36|                                                  █████████████████████████
2.23|                                                  █████████████████████████
2.09|                                                  █████████████████████████
1.95|                         ██████████████████████████████████████████████████
1.82|                         ██████████████████████████████████████████████████
1.68|                         ██████████████████████████████████████████████████
1.55|                         ██████████████████████████████████████████████████
1.41|                         ██████████████████████████████████████████████████
1.27|                         ██████████████████████████████████████████████████
1.14|                         ██████████████████████████████████████████████████
1.00|███████████████████████████████████████████████████████████████████████████
----+---------------------------------------------------------------------------
    |            a                        b                        c            

It includes the first column as the category label on the x-axis, and the second column as the value to be plotted on the y-axis. You can tweak all sorts of configuration, including height and width:

result.formatChart(new ChartFormat().dimensions(6, 20));

To get this much smaller chart:

4.00|          █████
3.00|          █████
2.00|     ██████████
1.00|███████████████
----+---------------
    |  a    b    c  

To include the values of the other value columns in a stacked chart (which is the default) write:

result.formatChart(new ChartFormat()
    .dimensions(40, 8)
    .values(1, 2, 3)
);

Producing:

6.00|▒▒▒▒▒▒▒▒▒▒▒▒                       
5.00|▒▒▒▒▒▒▒▒▒▒▒▒            ▒▒▒▒▒▒▒▒▒▒▒
4.00|▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒███████████
3.00|▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓███████████
2.00|▓▓▓▓▓▓▓▓▓▓▓▓███████████████████████
1.00|███████████████████████████████████
----+-----------------------------------
    |     a           b          c      

If those default ASCII characters from the sweet old 90s MS-DOS and BBS times don’t align well with your font (e.g. as on this blog), you can switch them like this:

result.formatChart(new ChartFormat()
    .dimensions(40, 8)
    .values(1, 2, 3)
    .shades('@', 'o', '.')
);

And now you’re getting:

6.00|............                       
5.00|............            ...........
4.00|........................@@@@@@@@@@@
3.00|oooooooooooooooooooooooo@@@@@@@@@@@
2.00|oooooooooooo@@@@@@@@@@@@@@@@@@@@@@@
1.00|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
----+-----------------------------------
    |     a           b          c      

Prefer displaying 100% charts? No problem

result.formatChart(new ChartFormat()
    .dimensions(40, 8)
    .values(1, 2, 3)
    .shades('@', 'o', '.')
    .display(Display.HUNDRED_PERCENT_STACKED)
);

And now, the result is:

100.00%|................................
 80.00%|......................@@@@@@@@@@
 60.00%|...........ooooooooooo@@@@@@@@@@
 40.00%|ooooooooooo@@@@@@@@@@@@@@@@@@@@@
 20.00%|ooooooooooo@@@@@@@@@@@@@@@@@@@@@
  0.00%|@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
-------+--------------------------------
       |     a          b         c     

Who needs MS Excel? No one, if you have jOOQ!

jOOQ 3.15’s New Multiset Operator Will Change How You Think About SQL

This is how SQL should have been used all along.

They called it The Third Manifesto, ORDBMS, or other things. Regrettably, it never really took off. Because most vendors didn’t adopt it. And those who did, didn’t agree on syntax.

But this is about to change. Thanks to the now ubiquitous SQL/JSON support (which jOOQ 3.14 has already covered), we can now emulate the most powerful ORDBMS feature that you will want to use everywhere: Nested collections!

How We Used to do Things: With Joins

We’re going to be using the Sakila database for this example. It’s a DVD rental store with things like ACTOR, FILM, CATEGORY (of films) and other nice relational things. Let’s write a query for this requirement

Get me all the films with their actors and their categories

Classically, we’d go ahead and use jOOQ to write:

ctx.select(
      FILM.TITLE,
      ACTOR.FIRST_NAME,
      ACTOR.LAST_NAME,
      CATEGORY.NAME
   )
   .from(ACTOR)
   .join(FILM_ACTOR)
     .on(ACTOR.ACTOR_ID.eq(FILM_ACTOR.ACTOR_ID))
   .join(FILM)
     .on(FILM_ACTOR.FILM_ID.eq(FILM.FILM_ID))
   .join(FILM_CATEGORY)
     .on(FILM.FILM_ID.eq(FILM_CATEGORY.FILM_ID))
   .join(CATEGORY)
     .on(FILM_CATEGORY.CATEGORY_ID.eq(CATEGORY.CATEGORY_ID))
   .orderBy(1, 2, 3, 4)
   .fetch();

And the result? Not so nice. A denormalised, flat table containing tons of repetition:

+----------------+----------+---------+-----------+
|title           |first_name|last_name|name       |
+----------------+----------+---------+-----------+
|ACADEMY DINOSAUR|CHRISTIAN |GABLE    |Documentary|
|ACADEMY DINOSAUR|JOHNNY    |CAGE     |Documentary|
|ACADEMY DINOSAUR|LUCILLE   |TRACY    |Documentary|
|ACADEMY DINOSAUR|MARY      |KEITEL   |Documentary|
|ACADEMY DINOSAUR|MENA      |TEMPLE   |Documentary|
|ACADEMY DINOSAUR|OPRAH     |KILMER   |Documentary|
|ACADEMY DINOSAUR|PENELOPE  |GUINESS  |Documentary|
|ACADEMY DINOSAUR|ROCK      |DUKAKIS  |Documentary|
|ACADEMY DINOSAUR|SANDRA    |PECK     |Documentary|
|ACADEMY DINOSAUR|WARREN    |NOLTE    |Documentary|
|ACE GOLDFINGER  |BOB       |FAWCETT  |Horror     |
|ACE GOLDFINGER  |CHRIS     |DEPP     |Horror     |
|ACE GOLDFINGER  |MINNIE    |ZELLWEGER|Horror     |
 ...

If we don’t consume this result unmodified as it is (e.g. when displaying tabular data to a user), we’d then go and de-duplicate things, shoehorning them back again into some nested data structures (e.g. for consumption by some JSON based UI), and spending hours trying to untangle the cartesian products between the 2 nested collections FILM -> ACTOR and FILM -> CATEGORY (because ACTOR and CATEGORY now created a cartesian product, which we didn’t want!)

In the worst case, we don’t even notice! This example database only has 1 category per film, but it is designed to support multiple categories.

jOOQ can help with that deduplication, but just look at the number of questions for jOOQ many to many on Stack Overflow! You’ll probably still have to write at least 2 queries to separate the nested collections.

ENTER the Stage: Multiset

The standard SQL <multiset value constructor> operator allows for collecting the data from a correlated subquery into a nested data structure, a MULTISET. Everything in SQL is a MULTISET, so the operator isn’t too surprising. But the nesting is where it shines. The previous query can now be re-written in jOOQ as follows:

var result =
dsl.select(
      FILM.TITLE,
      multiset(
        select(
          FILM_ACTOR.actor().FIRST_NAME, 
          FILM_ACTOR.actor().LAST_NAME)
        .from(FILM_ACTOR)
        .where(FILM_ACTOR.FILM_ID.eq(FILM.FILM_ID))
      ).as("actors"),
      multiset(
        select(FILM_CATEGORY.category().NAME)
        .from(FILM_CATEGORY)
        .where(FILM_CATEGORY.FILM_ID.eq(FILM.FILM_ID))
      ).as("films")
   )
   .from(FILM)
   .orderBy(FILM.TITLE)
   .fetch();

Note, it’s not relevant to this task, but I’m using jOOQ’s type safe implicit to-one join feature, which further helps taming the joins, syntactically. A matter of taste.

How to read this query? Easy:

  • Get all the films
  • For each FILM, get all the actors as a nested collection
  • For each FILM, get all the categories as a nested collection

You’re going to like Java 10’s var keyword even more after this :) Because what type is result? It’s of this type:

Result<Record3<
  String,                          // FILM.TITLE
  Result<Record2<String, String>>, // ACTOR.FIRST_NAME, ACTOR.LAST_NAME
  Result<Record1<String>>          // CATEGORY.NAME
>>

That’s quite something. Not too complex if you think of it. There’s a result with 3 columns

  • TITLE
  • A first nested result that has 2 string columns: ACTOR.FIRST_NAME and ACTOR.LAST_NAME
  • A second nested result that has 1 string column: CATEGORY.NAME

Using var or other type inference mechanisms, you don’t have to denote this type. Or even better (stay tuned): We’ll type-safely map the structural type to our nominal DTO type hierarchy, with just a few additional lines of code. I’ll explain that later.

What Does the Result Look Like?

Calling toString() on the above Result type yields something like this:

+---------------------------+--------------------------------------------------+---------------+
|title                      |actors                                            |films          |
+---------------------------+--------------------------------------------------+---------------+
|ACADEMY DINOSAUR           |[(PENELOPE, GUINESS), (CHRISTIAN, GABLE), (LUCI...|[(Documentary)]|
|ACE GOLDFINGER             |[(BOB, FAWCETT), (MINNIE, ZELLWEGER), (SEAN, GU...|[(Horror)]     |
|ADAPTATION HOLES           |[(NICK, WAHLBERG), (BOB, FAWCETT), (CAMERON, ST...|[(Documentary)]|
|AFFAIR PREJUDICE           |[(JODIE, DEGENERES), (SCARLETT, DAMON), (KENNET...|[(Horror)]     |
|AFRICAN EGG                |[(GARY, PHOENIX), (DUSTIN, TAUTOU), (MATTHEW, L...|[(Family)]     |
|AGENT TRUMAN               |[(KIRSTEN, PALTROW), (SANDRA, KILMER), (JAYNE, ...|[(Foreign)]    |
 ...

Notice how we’re back to getting each FILM.TITLE entry only once (no duplication), and nested in each row are the subquery results. There’s no denormalisation happening!

When calling result.formatJSON() with the appropriate formatting options, we’ll get this representation:

[
  {
    "title": "ACADEMY DINOSAUR",
    "actors": [
      {
        "first_name": "PENELOPE",
        "last_name": "GUINESS"
      },
      {
        "first_name": "CHRISTIAN",
        "last_name": "GABLE"
      },
      {
        "first_name": "LUCILLE",
        "last_name": "TRACY"
      },
      {
        "first_name": "SANDRA",
        "last_name": "PECK"
      },

      ...
    ],
    "films": [
      { "name": "Documentary" }
    ]
  },
  {
    "title": "ACE GOLDFINGER",
    "actors": [
      {
        "first_name": "BOB",
        "last_name": "FAWCETT"
      },
  ...

Calling result.formatXML() would produce this:

<result>
  <record>
    <title>ACADEMY DINOSAUR</title>
    <actors>
      <result>
        <record>
          <first_name>PENELOPE</first_name>
          <last_name>GUINESS</last_name>
        </record>
        <record>
          <first_name>CHRISTIAN</first_name>
          <last_name>GABLE</last_name>
        </record>
        <record>
          <first_name>LUCILLE</first_name>
          <last_name>TRACY</last_name>
        </record>
        <record>
          <first_name>SANDRA</first_name>
          <last_name>PECK</last_name>
        </record>
        ...
      </result>
    </actors>
    <films>
      <result>
        <record>
          <name>Documentary</name>
        </record>
      </result>
    </films>
  </record>
  <record>
    <title>ACE GOLDFINGER</title>
    <actors>
      <result>
        <record>
          <first_name>BOB</first_name>
          <last_name>FAWCETT</last_name>
        </record>
        ...

You get the idea!

What’s the Generated SQL?

Just turn on jOOQ’s DEBUG logging and observe a query like this one (in PostgreSQL):

select
  film.title,
  (
    select coalesce(
      jsonb_agg(jsonb_build_object(
        'first_name', t.first_name,
        'last_name', t.last_name
      )),
      jsonb_build_array()
    )
    from (
      select 
        alias_78509018.first_name, 
        alias_78509018.last_name
      from (
        film_actor
          join actor as alias_78509018
            on film_actor.actor_id = alias_78509018.actor_id
        )
      where film_actor.film_id = film.film_id
    ) as t
  ) as actors,
  (
    select coalesce(
      jsonb_agg(jsonb_build_object('name', t.name)),
      jsonb_build_array()
    )
    from (
      select alias_130639425.name
      from (
        film_category
          join category as alias_130639425
            on film_category.category_id = alias_130639425.category_id
        )
      where film_category.film_id = film.film_id
    ) as t
  ) as films
from film
order by film.title

The Db2, MySQL, Oracle, SQL Server versions would look similar. Just try it on your Sakila database installation. It runs fast, too.

Mapping the Results to DTOs

Now, I promised to get rid of that lengthy structural type with all the generics. Check this out!

We used to call them POJOs (Plain Old Java Objects). Then DTOs (Data Transfer Objects). Now records. Yes, let’s try some Java 16 records here. (Note, records aren’t required for these examples. Any POJOs with appropriate constructors would do).

Wouldn’t it be nice, if result was of this type

record Actor(String firstName, String lastName) {}
record Film(
  String title,
  List<Actor> actors,
  List<String> categories
) {}

List<Film> result = ...

Structural typing is essential to jOOQ and its type safe query system, but in your code, you probably don’t want to have those merge-conflict lookalike generics all the time, and even var won’t help you if your data needs to be returned from a method.

So, let’s transform our jOOQ query to one that produces List<Film>, step by step. We’re starting with the original query, untouched:

Result<Record3<
    String, 
    Result<Record2<String, String>>, 
    Result<Record1<String>>
>> result =
dsl.select(
      FILM.TITLE,
      multiset(
        select(
          FILM_ACTOR.actor().FIRST_NAME, 
          FILM_ACTOR.actor().LAST_NAME)
        .from(FILM_ACTOR)
        .where(FILM_ACTOR.FILM_ID.eq(FILM.FILM_ID))
      ).as("actors"),
      multiset(
        select(FILM_CATEGORY.category().NAME)
        .from(FILM_CATEGORY)
        .where(FILM_CATEGORY.FILM_ID.eq(FILM.FILM_ID))
      ).as("films")
   )
   .from(FILM)
   .orderBy(FILM.TITLE)
   .fetch();

Now, we’re going to map the Actor in the first MULTISET expression. This can be done as follows, using the new Field.convertFrom() convenience method, which allows to turn a Field<T> into any read-only Field<U> for ad-hoc usage. A simple example would be this:

record Title(String title) {}

// Use this field in any query
Field<Title> title = FILM.TITLE.convertFrom(Title::new);

It’s just an easy, new way to attach a read-only Converter to a Field for single usage, instead of doing that with the code generator.

Applied to the original query:

Result<Record3<
    String, 
    List<Actor>,  // A bit nicer already
    Result<Record1<String>>
>> result =
dsl.select(
      FILM.TITLE,
      multiset(
        select(
          FILM_ACTOR.actor().FIRST_NAME, 
          FILM_ACTOR.actor().LAST_NAME)
        .from(FILM_ACTOR)
        .where(FILM_ACTOR.FILM_ID.eq(FILM.FILM_ID))

      // Magic here: vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
      ).as("actors").convertFrom(r -> r.map(mapping(Actor::new))),

      multiset(
        select(FILM_CATEGORY.category().NAME)
        .from(FILM_CATEGORY)
        .where(FILM_CATEGORY.FILM_ID.eq(FILM.FILM_ID))
      ).as("films")
   )
   .from(FILM)
   .orderBy(FILM.TITLE)
   .fetch();

What are we doing here?

  • The method convertFrom() takes a lambda Result<Record2<String, String>> -> Actor.
  • The Result type is the usual jOOQ Result, which has a Result.map(RecordMapper<R, E>) method.
  • The mapping() method is the new Records.mapping(), which isn’t doing much, just turning a constructor reference of type Function2<String, String, Actor> into a RecordMapper, which can then be used to turn a Result<Record2<String, String>> into a List<Actor>.

And it type checks! Try it yourself. If you add a column to the multiset, you’ll get a compilation error. If you add/remove an attribute from the Actor record, you’ll get a compilation error. No reflection here, just declarative mapping of jOOQ results/records to custom List<UserType>. If you prefer the “old” reflection approach using jOOQ’s ubiquitous into() methods, you can still do that, too:

Result<Record3<
    String, 
    List<Actor>,  // A bit nicer already
    Result<Record1<String>>
>> result =
dsl.select(
      FILM.TITLE,
      multiset(
        select(
          FILM_ACTOR.actor().FIRST_NAME, 
          FILM_ACTOR.actor().LAST_NAME)
        .from(FILM_ACTOR)
        .where(FILM_ACTOR.FILM_ID.eq(FILM.FILM_ID))

      // Magic here: vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
      ).as("actors").convertFrom(r -> r.into(Actor.class))),

      multiset(
        select(FILM_CATEGORY.category().NAME)
        .from(FILM_CATEGORY)
        .where(FILM_CATEGORY.FILM_ID.eq(FILM.FILM_ID))
      ).as("films")
   )
   .from(FILM)
   .orderBy(FILM.TITLE)
   .fetch();

The result still type checks, but the conversion from Result<Record2<String, String>> to List<Actor> no longer does, it uses reflection.

Let’s continue. Let’s remove the clumsy category Result<Record1<String>>. We could’ve added another record, but in this case, a List<String> will suffice.

Result<Record3<
    String, 
    List<Actor>,
    List<String>  // Begone, jOOQ structural type!
>> result =
dsl.select(
      FILM.TITLE,
      multiset(
        select(
          FILM_ACTOR.actor().FIRST_NAME, 
          FILM_ACTOR.actor().LAST_NAME)
        .from(FILM_ACTOR)
        .where(FILM_ACTOR.FILM_ID.eq(FILM.FILM_ID))
      ).as("actors").convertFrom(r -> r.map(mapping(Actor::new))),
      multiset(
        select(FILM_CATEGORY.category().NAME)
        .from(FILM_CATEGORY)
        .where(FILM_CATEGORY.FILM_ID.eq(FILM.FILM_ID))

      // Magic.     vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
      ).as("films").convertFrom(r -> r.map(Record1::value1))
   )
   .from(FILM)
   .orderBy(FILM.TITLE)
   .fetch();

And finally, the outer-most Result<Record3<...>> to List<Film> conversion

List<Film> result =
dsl.select(
      FILM.TITLE,
      multiset(
        select(
          FILM_ACTOR.actor().FIRST_NAME, 
          FILM_ACTOR.actor().LAST_NAME)
        .from(FILM_ACTOR)
        .where(FILM_ACTOR.FILM_ID.eq(FILM.FILM_ID))
      ).as("actors").convertFrom(r -> r.map(mapping(Actor::new))),
      multiset(
        select(FILM_CATEGORY.category().NAME)
        .from(FILM_CATEGORY)
        .where(FILM_CATEGORY.FILM_ID.eq(FILM.FILM_ID))
      ).as("films").convertFrom(r -> r.map(Record1::value1))
   )
   .from(FILM)
   .orderBy(FILM.TITLE)

   //     vvvvvvvvvvvvvvvvvv grande finale
   .fetch(mapping(Film::new));

This time, we don’t need the Field.convertFrom() method. Just using the Records.mapping() auxiliary will be sufficient.

A More Complex Example

The previous example showed nesting of two independent collections, which is quite hard with classic JOIN based SQL or ORMs. How about a much more complex example, where we nest things 2 levels, one of which being an aggregation, even? The requirement is:

Give me all the films, the actors that played in the film, the categories that categorise the film, the customers that have rented the film, and all the payments per customer for that film

I won’t even show a JOIN based approach. Let’s dive directly into MULTISET and the also new, synthetic MULTISET_AGG aggregate function. Here’s how to do this with jOOQ. Now, check out that beautiful result type:

Result<Record4<
    String,                   // FILM.TITLE
    Result<Record2<
        String,               // ACTOR.FIRST_NAME
        String                // ACTOR.LAST_NAME
    >>,                       // "actors"
    Result<Record1<String>>,  // CATEGORY.NAME
    Result<Record4<
        String,               // CUSTOMER.FIRST_NAME
        String,               // CUSTOMER.LAST_NAME
        Result<Record2<
            LocalDateTime,    // PAYMENT.PAYMENT_DATE
            BigDecimal        // PAYMENT.AMOUNT
        >>, 
        BigDecimal            // "total"
    >>                        // "customers"
>> result = 
dsl.select(

        // Get the films
        FILM.TITLE,

        // ... and all actors that played in the film
        multiset(
            select(
                FILM_ACTOR.actor().FIRST_NAME,
                FILM_ACTOR.actor().LAST_NAME
            )
            .from(FILM_ACTOR)
            .where(FILM_ACTOR.FILM_ID.eq(FILM.FILM_ID))
        ).as("actors"),

        // ... and all categories that categorise the film
        multiset(
            select(FILM_CATEGORY.category().NAME)
            .from(FILM_CATEGORY)
            .where(FILM_CATEGORY.FILM_ID.eq(FILM.FILM_ID))
        ).as("categories"),

        // ... and all customers who rented the film, as well
        // as their payments
        multiset(
            select(
                PAYMENT.rental().customer().FIRST_NAME,
                PAYMENT.rental().customer().LAST_NAME,
                multisetAgg(
                    PAYMENT.PAYMENT_DATE,
                    PAYMENT.AMOUNT
                ).as("payments"),
                sum(PAYMENT.AMOUNT).as("total"))
            .from(PAYMENT)
            .where(PAYMENT
                .rental().inventory().FILM_ID.eq(FILM.FILM_ID))
            .groupBy(
                PAYMENT.rental().customer().CUSTOMER_ID,
                PAYMENT.rental().customer().FIRST_NAME,
                PAYMENT.rental().customer().LAST_NAME)
        ).as("customers")
    )
    .from(FILM)
    .where(FILM.TITLE.like("A%"))
    .orderBy(FILM.TITLE)
    .limit(5)
    .fetch();

You’ll be using var, of course, rather than denoting this insane type but I wanted to denote the type explicitly for the sake of the example.

Note again how implicit joins were really helpful here since we wanted to aggregate all the payments per customer, we can just select from PAYMENT and group by the payment’s PAYMENT.rental().customer(), as well as correlate the subquery by PAYMENT.rental().inventory().FILM_ID without any extra effort.

The executed SQL looks like this, where you can see the generated implicit joins (run it on your PostgreSQL Sakila database!):

select
  film.title,
  (
    select coalesce(
      jsonb_agg(jsonb_build_object(
        'first_name', t.first_name,
        'last_name', t.last_name
      )),
      jsonb_build_array()
    )
    from (
      select alias_78509018.first_name, alias_78509018.last_name
      from (
        film_actor
          join actor as alias_78509018
            on film_actor.actor_id = alias_78509018.actor_id
        )
      where film_actor.film_id = film.film_id
    ) as t
  ) as actors,
  (
    select coalesce(
      jsonb_agg(jsonb_build_object('name', t.name)),
      jsonb_build_array()
    )
    from (
      select alias_130639425.name
      from (
        film_category
          join category as alias_130639425
            on film_category.category_id = 
               alias_130639425.category_id
        )
      where film_category.film_id = film.film_id
    ) as t
  ) as categories,
  (
    select coalesce(
      jsonb_agg(jsonb_build_object(
        'first_name', t.first_name,
        'last_name', t.last_name,
        'payments', t.payments,
        'total', t.total
      )),
      jsonb_build_array()
    )
    from (
      select
        alias_63965917.first_name,
        alias_63965917.last_name,
        jsonb_agg(jsonb_build_object(
          'payment_date', payment.payment_date,
          'amount', payment.amount
        )) as payments,
        sum(payment.amount) as total
      from (
        payment
          join (
            rental as alias_102068213
              join customer as alias_63965917
                on alias_102068213.customer_id = 
                   alias_63965917.customer_id
              join inventory as alias_116526225
                on alias_102068213.inventory_id = 
                   alias_116526225.inventory_id
          )
            on payment.rental_id = alias_102068213.rental_id
        )
      where alias_116526225.film_id = film.film_id
      group by 
        alias_63965917.customer_id, 
        alias_63965917.first_name, 
        alias_63965917.last_name
    ) as t
  ) as customers
from film
where film.title like 'A%'
order by film.title
fetch next 5 rows only

The result, in JSON, now looks something like this:

[
  {
    "title": "ACADEMY DINOSAUR",
    "actors": [
      {
        "first_name": "PENELOPE",
        "last_name": "GUINESS"
      },
      {
        "first_name": "CHRISTIAN",
        "last_name": "GABLE"
      },
      {
        "first_name": "LUCILLE",
        "last_name": "TRACY"
      },
      ...
    ],
    "categories": [{ "name": "Documentary" }],
    "customers": [
      {
        "first_name": "SUSAN",
        "last_name": "WILSON",
        "payments": [
          {
            "payment_date": "2005-07-31T22:08:29",
            "amount": 0.99
          }
        ],
        "total": 0.99
      },
      {
        "first_name": "REBECCA",
        "last_name": "SCOTT",
        "payments": [
          {
            "payment_date": "2005-08-18T18:36:16",
            "amount": 0.99
          }
        ],
        "total": 0.99
      },
      ...

That’s it. Nesting collections in arbitrary ways is completely effortless and intuitive. No N+1, no deduplication. Just declare the results in exactly the form you require in your client.

There is no other way to pull off this complexity with such ease, than letting your RDBMS do the heavy lifting of planning and running such a query, and letting jOOQ do the mapping.

Conclusion

We had this kind of functionality all along. We just never used it, or not enough. Why? Because client APIs did not make it accessible enough. Because RDBMS didn’t agree on syntax enough.

That’s now over. jOOQ uses standard SQL MULTISET syntax in its DSL API, enhances it with the synthetic MULTISET_AGG aggregate function, the way all RDBMS should have implemented it (go Informix, Oracle). We can wait for another 40 years for the other RDBMS to implement this, or we just use jOOQ today.

And, I cannot stress this enough:

  • This is all type safe
  • There is no reflection
  • The nesting is done in the database using SQL (via SQL/XML or SQL/JSON for now)
  • … So, execution planners can optimise your entire query
  • … No extra columns or extra queries or other extra work is performed in the database

This works on all dialects that have either SQL/XML or SQL/JSON support (or both), including the major popular dialects:

  • MySQL
  • Oracle
  • PostgreSQL
  • SQL Server

And it is offered under the usual license terms of jOOQ. So, happy nesting of collections.

Addendum: Using SQL/XML or SQL/JSON directly

You may be tempted to use this everywhere. And you rightfully do so. But beware of this, if your SQL client is consuming XML or JSON directly, there’s no need to use MULTISET. Use jOOQ’s native SQL/XML or SQL/JSON support that was introduced in jOOQ 3.14. That way, you won’t convert from JSON to jOOQ results to JSON, but stream the JSON (or XML) to your frontend directly.

Use ResultQuery.collect() to Implement Powerful Mappings

In our opinion, any Iterable<T> should offer a <R> collect(Collector<T, ?, R>) method to allow for transforming the the content to something else using standard JDK collectors, jOOλ collectors from org.jooq.lambda.Agg or your own.

When using jOOQ, you don’t have to wait for the JDK to finally add these useful utilities to the Iterable API. jOOQ’s ResultQuery<R> already implements Iterable<R>, and offers additional convenience like collect() on top of it.

For example, using a Java 16 record type:

record Book (int id, String title) {}

List<Book> books =
ctx.select(BOOK.ID, BOOK.TITLE)
   .from(BOOK)
   .collect(Collectors.mapping(
        r -> r.into(Book.class),
        Collectors.toList()
   ));

There are other ways to map things, but why not use a Collector. The best thing about Collector types is, they compose, type safely, and arbitrarily, almost like Stream pipelines.

I found a very interesting use-case recently on Stack Overflow. The problem there was that fetchGroups() is quite simple and not left-join aware, meaning that when an AUTHOR (parent) has no BOOK (child), instead of an empty list, there will be a list with a single NULL item:

Map<AuthorRecord, List<BookRecord>> result =
ctx.select()
   .from(AUTHOR)
   .leftJoin(BOOK).onKey()
   .fetchGroups(AUTHOR, BOOK);

The above works well for inner joins, but it doesn’t really make sense for left joins. We should fix this in jOOQ, of course (https://github.com/jOOQ/jOOQ/issues/11888), but using Collectors, you can already work around this problem today.

Simply write

Map<AuthorRecord, List<BookRecord>> result =
ctx.select()
   .from(AUTHOR)
   .leftJoin(BOOK).onKey()
   .collect(groupingBy(
        r -> r.into(AUTHOR), 
        filtering(
            r -> r.get(BOOK.ID) != null, 
            mapping(
                r -> r.into(BOOK), 
                toList()
            )
        )
    ));

// All assuming the usual static imports:
import static org.jooq.impl.DSL.*;
import static com.example.generated.Tables.*;
import static java.util.stream.Collectors.*;

Step by step:

  1. Group results by AUTHOR, mapping keys to AuthorRecord just like jOOQ’s fetchGroups()
  2. For each AUTHOR filter out those BOOK records whose BOOK.ID is null. Given that BOOK.ID is the primary key, the only reason why it could be null is because of the left join
  3. Map values to BookRecord, just like jOOQ’s fetchGroups()
  4. Collect the child records into a list.

And you’re done. Just like when you use the ResultQuery as an Iterable in a foreach loop, the collect() call executes your query automatically, managing all resources, bypassing the intermediate Result data structure, which isn’t needed here.

Could we Have a Language That Hides Collections From Us?

I just fixed a bug. The fix required me to initialise an Object[] array with the init values for each type, instead of just null, i.e. false for boolean, 0 for int, 0.0 for double, etc. So, instead of just doing:

Object[] converted = new Object[parameterTypes.length];

I needed:

Object[] converted = new Object[parameterTypes.length];

for (int i = 0; i < converted.length; i++)
    converted[i] = Reflect.initValue(parameterTypes[i]);

For the subjective 8E17th time, I wrote a loop. A loop that did nothing interesting other than call a method for each of the looped structure’s elements. And I felt the pain of our friend Murtaugh

Why do we distinguish between T and T[]?

What I really wanted to do is this. I have a method Reflect.initValue()

public static <T> T initValue(Class<T> type) {}

What I really want to do is this, in one way or another:

converted = initValue(parameterTypes);

(Yes, there are subtleties that need to be thought about, such as should this init an array or assign values to an array. Forget about them for now. Big picture first). The point is, no one enjoys writing loops. No one enjoys writing map/flatMap either:

Stream.of(parameterTypes)
      .map(Reflect::initValue)
      .toArray(converted);

It’s so much useless, repetitive, infrastructural ceremony that I don’t enjoy writing nor reading. My “business logic” here is simply

converted = initValue(parameterTypes);

I have 3 elements:
  • A source data structure parameterTypes
  • A target data structure converted
  • A mapping function initValue
That’s all I should be seeing in my code. All the infrastructure of how to iterate is completely meaningless and boring.

SQL joins

In fact, SQL joins are often the same. We use primary key / foreign key relationships, so the path between parent and child tables is very obvious in most cases. Joins are cool, relational algebra is cool, but in most cases, it just gets in the way of writing understandable business logic. In my opinion, this is one of Hibernate’s biggest innovations (probably others did this too, perhaps even before Hibernate): implicit joins, which jOOQ copied. There’s much ceremony in writing this:

SELECT
  cu.first_name,
  cu.last_name,
  co.country
FROM customer AS cu
JOIN address USING (address_id)
JOIN city USING (city_id)
JOIN country AS co USING (country_id)

When this alternative, intuitive syntax would be much more convenient:

SELECT
  cu.first_name,
  cu.last_name,
  cu.address.city.country.country
FROM customer AS cu

It is immediately clear what is meant by the implicit join syntax. The syntactic ceremony of writing the explicit joins is not necessary. Again, joins are really cool, and power users will be able to use them when needed. E.g. the occasional NATURAL FULL OUTER JOIN can still be done! But let’s admit it, 80% of all joins are boring, and could be replaced with the above syntax sugar.

Suggestion for Java

Of course, this suggestion will not be perfect, because it doesn’t deal with the gazillion edge cases of introducing such a significant feature to an old language. But again, if we allow ourselves to focus on the big picture, wouldn’t it be nice if we could:

class Author {
  String firstName;
  String lastName;
  Book[] books; // Or use any collection type here
}

class Book {
  String title;
}

And then:

Author[] authors = ...

// This...
String[] firstNames = authors.firstName;

// ...is sugar for this (oh how it hurts to type this):
String[] firstNames = new String[authors.length];
for (int i = 0; i < firstNames.length; i++)
    firstNames[i] = authors[i].firstName;

// And this...
int[] firstNameLengths = authors.firstName.length()

// ... is sugar for this:
int[] firstNameLengths = new int[authors.length];
for (int i = 0; i < firstNames.length; i++)
    firstNames[i] = authors[i].firstName.length();

// ... or even this, who cares (hurts to type even more):
int[] firstNameLengths = Stream
  .of(authors)
  .map(a -> a.firstName)
  .mapToInt(String::length)
  .toArray();

Ignore the usage of arrays, it could just as well be a List, Stream, Iterable, whatever data structure or syntax that allows to get from a 1 arity to an N arity. Or to get a set of author’s books:

Author[] authors = ...
Book[] books = authors.books;

Could it mean anything other than that:

Stream.of(authors)
      .flatMap(a -> Stream.of(a.books))
      .toArray(Book[]::new);

Why do we have to keep spelling these things out? They’re not business logic, they’re meaningless, boring, infrastructure. While yes, there are surely many edge cases (and we could live with the occasional compiler errors, if the compiler can’t figure out how to get from A to B), there are also many “very obvious” cases, where the cerimonial mapping logic (imperative or functional, doesn’t matter) is just completely obvious and boring. But it gets in the way of writing and reading, and despite the fact that it seems obvious in many cases, it is still error prone! I think it’s time to revisit the ideas behind APL, where everything is an array, and by consequence, operations on arity 1 types can be applied to arity N types just the same, because the distinction is often not very useful.

Bonus: Null

While difficult to imagine retrofitting a language like Java with this, a new language could do away with nulls forever, because the arity 0-1 is just a special case of the arity N: An empty array. Looking forward to your thoughts.

A Quick Trick to Make a Java Stream Construction Lazy

One of the Stream APIs greatest features is its laziness. The whole pipeline is constructed lazily, stored as a set of instructions, akin to a SQL execution plan. Only when we invoke a terminal operation, the pipeline is started. It is still lazy, meaning that some operations may be short circuited. Some third party libraries produce streams that are not entirely lazy. For example, jOOQ until version 3.12 eagerly executed a SQL query when calling ResultQuery.stream(), regardless if the Stream is consumed afterwards:

try (var stream = ctx.select(T.A, T.B).from(T).stream()) {
    // Not consuming the stream here
}

While this is probably a bug in client code, not executing the statement in this case might still be a useful feature. The exception being, of course, if the query contains a FOR UPDATE clause, in case of which the user probably uses Query.execute() instead, if they don’t care about the result. A more interesting example where laziness helps is the fact that we might not want this query to be executed right away, as are perhaps still on the wrong thread to execute it. Or we would like any possible exceptions to be thrown from wherever the result is consumed, i.e. where the terminal operation is called. For example:

try (var stream = ctx.select(T.A, T.B).from(T).stream()) {
    consumeElsewhere(stream);
}

And then:

public void consumeElsewhere(Stream<? extends Record> stream) {
    runOnSomeOtherThread(() -> {
        stream.map(r -> someMapping(r))
              .forEach(r -> someConsumer(r));
    });
}

While we’re fixing this in jOOQ 3.13 (https://github.com/jOOQ/jOOQ/issues/4934), you may be stuck to an older version of jOOQ, or have another library do the same thing. Luckily, there’s an easy trick to quickly make a third party stream “lazy”. Flatmap it! Just write this instead:

try (var stream = Stream.of(1).flatMap(
    i -> ctx.select(T.A, T.B).from(T).stream()
)) {
    consumeElsewhere(stream);
}

The following small test illustrates that the stream() is now being constructed lazily

public class LazyStream {

    @Test(expected = RuntimeException.class)
    public void testEager() {
        Stream<String> stream = stream();
    }

    @Test
    public void testLazyNoTerminalOp() {
        Stream<String> stream = Stream.of(1).flatMap(i -> stream());
    }

    @Test(expected = RuntimeException.class)
    public void testLazyTerminalOp() {
        Optional<String> result = stream().findAny();
    }

    public Stream<String> stream() {
        String[] array = { "heavy", "array", "creation" };

        // Some Resource Problem that might occur
        if (true)
            throw new RuntimeException();

        return Stream.of(array);
    }
}

Caveat

Depending on the JDK version you’re using, the above approach has its own significant problems. For example, in older versions of the JDK 8, flatMap() itself might not be lazy at all! More recent versions of the JDK have fixed that problem, including JDK 8u222: https://bugs.openjdk.java.net/browse/JDK-8225328

How to Write a Simple, yet Extensible API

How to write a simple API is already an art on its own.
I didn’t have time to write a short letter, so I wrote a long one instead. ― Mark Twain
But keeping an API simple for beginners and most users, and making it extensible for power users seems even more of a challenge. But is it?

What does “extensible” mean?

Imagine an API like, oh say, jOOQ. In jOOQ, you can write SQL predicates like this:

ctx.select(T.A, T.B)
   .from(T)
   .where(T.C.eq(1)) // Predicate with bind value here
   .fetch();

By default (as this should always be the default), jOOQ will generate and execute this SQL statement on your JDBC driver, using a bind variable:

SELECT t.a, t.b
FROM t
WHERE t.c = ?

The API made the most common use case simple. Just pass your bind variable as if the statement was written in e.g. PL/SQL, and let the language / API do the rest. So we passed that test. The use case for power users is to occasionally not use bind variables, for whatever reasons (e.g. skew in data and bad statistics, see also this post about bind variables).Will we pass that test as well? jOOQ mainly offers two ways to fix this: On a per-query basis You can turn your variable into an inline value explicitly for this single occasion:

ctx.select(T.A, T.B)
   .from(T)
   .where(T.C.eq(inline(1))) // Predicate without bind value here
   .fetch();

This is using the static imported DSL.inline() method. Works, but not very convenient, if you have to do this for several queries, for several bind values, or worse, depending on some context. This is a necessary API enhancement, but it does not make the API extensible. On a global basis Notice that ctx object there? It is the DSLContext object, the “contextual DSL”, i.e. the DSL API that is in the context of a jOOQ Configuration. You can thus set:

ctx2 = DSL.using(ctx
    .configuration()
    .derive()
    .set(new Settings()
    .withStatementType(StatementType.STATIC_STATEMENT));

// And now use this new DSLContext instead of the old one
ctx2.select(T.A, T.B)
    .from(T)
    .where(T.C.eq(1)) // No longer a bind variable
    .fetch();

Different approaches to offering such extensibility

We have our clean and simple API. Now some user wants to extend it. So often, we’re tempted to resort to a hack, e.g. by using thread locals, because they would work easily when under the assumption of a thread-bound execution model – such as e.g. classic Java EE Servlets
The price we’re paying for such a hack is high.
  1. It’s a hack, and as such it will break easily. If we offer this as functionality to a user, they will start depending on it, and we will have to support and maintain it
  2. It’s a hack, and it is based on assumptions, such as thread bound ness. It will not work in an async / reactive / parallel stream context, where our logic may jump back and forth between threads
  3. It’s a hack, and deep inside, we know it’s wrong. Obligatory XKCD: https://xkcd.com/292
This might obviously work, just like global (static) variables. You can set this variable globally (or “globally” for your own thread), and then the API’s internals will be able to read it. No need to pass around parameters, so no need to compromise on the APIs simplicity by adding optional and often ugly, distractive parameters. What are better approaches to offering such extensibility? Dependency Injection One way is to use explicit Dependency Injection (DI). If you have a container like Spring, you can rely on Spring injecting arbitrary objects into your method call / whatever, where you need access to it:
This way, if you maintain several contextual objects of different lifecycle scopes, you can let the DI framework make appropriate decisions to figure out where to get that contextual information from. For example, when using JAX-RS, you can do this using an annotation based approach:


// These annotations bind the method to some HTTP address
@GET
@Produces("text/plain")
@Path("/api")
public String method(

    // This annotation fetches a request-scoped object
    // from the method call's context
    @Context HttpServletRequest request,

    // This annotation produces an argument from the
    // URL's query parameters
    @QueryParam("arg") String arg
) {
    ...
}

This approach works quite nicely for static environments (annotations being static), where you do not want to react to dynamic URLs or endpoints. It is declarative, and a bit magic, but well designed, so once you know all the options, you can choose the right one for your use case very easily. While @QueryParam is mere convenience (you could have gotten the argument also from the HttpServletRequest), the @Context is powerful. It can help inject values of arbitrary lifecycle scope into your method / class / etc. I personally favour explicit programming over annotation-based magic (e.g. using Guice for DI), but that’s probably a matter of taste. Both are a great way for implementors of APIs (e.g. HTTP APIs) to help get access to framework objects. However, if you’re an API vendor, and want to give users of your API a way to extend the API, I personally favour jOOQ’s SPI approach. SPIs One of jOOQ’s strengths, IMO, is precisely this single, central place to register all SPI implementations that can be used for all sorts of purposes: The Configuration. For example, on such a Configuration you can specify a JSR-310 java.time.Clock. This clock will be used by jOOQ’s internals to produce client side timestamps, instead of e.g. using System.currentTimeMillis(). Definitely a use case for power users only, but once you have this use case, you really only want to tweak a single place in jOOQ’s API: The Configuration. All of jOOQ’s internals will always have a Configuration reference available. And it’s up to the user to decide what the scope of this object is, jOOQ doesn’t care. E.g.
  • per query
  • per thread
  • per request
  • per session
  • per application
In other words, to jOOQ, it doesn’t matter at all if you’re implementing a thread-bound, blocking, classic servlet model, or if you’re running your code reactively, or in parallel, or whatever. Just manage your own Configuration lifecycle, jOOQ doesn’t care. In fact, you can have a global, singleton Configuration and implement thread bound components of it, e.g. the ConnectionProvider SPI, which takes care of managing the JDBC Connection lifecycle for jOOQ. Typically, users will use e.g. a Spring DataSource, which manages JDBC Connection (and transactions) using a thread-bound model, internally using ThreadLocal. jOOQ does not care. The SPI specifies that jOOQ will: Again, it does not matter to jOOQ what the specific ConnectionProvider implementation does. You can implement it in any way you want if you’re a power user. By default, you’ll just pass jOOQ a DataSource, and it will wrap it in a default implementation called DataSourceConnectionProvider for you. The key here is again:
  • The API is simple by default, i.e. by default, you don’t have to know about this functionality, just pass jOOQ a DataSource as always when working with Java and SQL, and you’re ready to go
  • The SPI allows for easily extending the API without compromising on its simplicity, by providing a single, central access point to this kind of functionality
Other SPIs in Configuration include:
  • ExecuteListener: An extremely useful and simple way to hook into the entire jOOQ query management lifecycle, from generating the SQL string to preparing the JDBC statement, to binding variables, to execution, to fetching result sets. A single SPI can accomodate various use cases like SQL logging, patching SQL strings, patching JDBC statements, listening to result set events, etc.
  • ExecutorProvider: Whenever jOOQ runs something asynchronously, it will ask this SPI to provide a standard JDK Executor, which will be used to run the asynchronous code block. By default, this will be the JDK default (the default ForkJoinPool), as always. But you probably want to override this default, and you want to be in full control of this, and not think about it every single time you run a query.
  • MetaProvider: Whenever jOOQ needs to look up database meta information (schemas, tables, columns, types, etc.), it will ask this MetaProvider about the available meta information. By default, this will run queries on the JDBC DatabaseMetaData, which is good enough, but maybe you want to wire these calls to your jOOQ-generated classes, or something else.
  • RecordMapperProvider and RecordUnmapperProvider: jOOQ has a quite versatile default implementation of how to map between a jOOQ Record and an arbitrary Java class, supporting a variety of standard approaches including JavaBeans getter/setter naming conventions, JavaBeans @ConstructorProperties, and much more. These defaults apply e.g. when writing query.fetchInto(MyBean.class). But sometimes, the defaults are not good enough, and you want this particular mapping to work differently. Sure, you could write query.fetchInto(record -> mymapper(record)), but you may not want to remember this for every single query. Just override the mapper (and unmapper) at a single, central spot for your own chosen Configuration scope (e.g. per query, per request, per session, etc.) and you’re done

Conclusion

Writing a simple API is difficult. Making it extensible in a simple way, however, is not. If your API has achieved “simplicity”, then it is very easy to support injecting arbitrary SPIs for arbitrary purposes at a single, central location, such as jOOQ’s Configuration. In my most recent talk “10 Reasons Why we Love Some APIs and Why we Hate Some Others”, I’ve made a point that things like simplicity, discoverability, consistency, and convenience are among the most important aspects of a great API. How do you define a good API? The most underrated answer on this (obviously closed) Stack Overflow question is this one: . Again, this is hard in terms of creating a simple API. But it is extremely easy when making this simple API extensible. Make your SPIs very easily discoverable. A jOOQ power user will always look for extension points in jOOQ’s Configuration. And because the extension points are explicit types which have to be implemented (as opposed to annotations and their magic), no documentation is needed to learn the SPI (of course it is still beneficial as a reference). I’d love to hear your alternative approaches to this API design challenge in the comments. Watch the full talk here:

How to Unit Test Your Annotation Processor using jOOR

Annotation processors can be useful as a hacky workaround to get some language feature into the Java language. jOOQ also has an annotation processor that helps validate SQL syntax for:
  • Plain SQL usage (SQL injection risk)
  • SQL dialect support (prevent using an Oracle only feature on MySQL)
You can read about it more in detail here.

Unit testing annotation processors

Unit testing annotation processors is a bit more tricky than using them. Your processor hooks into the Java compiler and manipulates the compiled AST (or does other things). If you want to test your own processor, you need the test to run a Java compiler, but that is difficult to do in a normal project setup, especially if the expected behaviour for a given test is a compilation error. Let’s assume we have the following two annotations:

@interface A {}
@interface B {}

And now, we would like to establish a rule that @A must always be accompanied by @B. For example:

// This must not compile
@A
class Bad {}

// This is fine
@A @B
class Good {}

We’ll enforce that with an annotation processor:

class AProcessor implements Processor {
    boolean processed;
    private ProcessingEnvironment processingEnv;

    @Override
    public Set<String> getSupportedOptions() {
        return Collections.emptySet();
    }

    @Override
    public Set<String> getSupportedAnnotationTypes() {
        return Collections.singleton("*");
    }

    @Override
    public SourceVersion getSupportedSourceVersion() {
        return SourceVersion.RELEASE_8;
    }

    @Override
    public void init(ProcessingEnvironment processingEnv) {
        this.processingEnv = processingEnv;
    }

    @Override
    public boolean process(Set<? extends TypeElement> annotations, RoundEnvironment roundEnv) {
        for (TypeElement e1 : annotations)
            if (e1.getQualifiedName().contentEquals(A.class.getName()))
                for (Element e2 : roundEnv.getElementsAnnotatedWith(e1))
                    if (e2.getAnnotation(B.class) == null)
                        processingEnv.getMessager().printMessage(ERROR, "Annotation A must be accompanied by annotation B");

        this.processed = true;
        return false;
    }

    @Override
    public Iterable<? extends Completion> getCompletions(Element element, AnnotationMirror annotation, ExecutableElement member, String userText) {
        return Collections.emptyList();
    }
}

Now, this works. We can easily verify that manually by adding the annotation processor to some Maven compiler configuration and by annotating a few classes with A and B. But then, someone changes the code and we don’t notice the regression. How can we unit test this, rather than doing things manually?

jOOR 0.9.10 support for annotation processors

jOOR is our little open source reflection library that we’re using internally in jOOQ jOOR has a convenient API to invoke the javax.tools.JavaCompiler API through Reflect.compile(). The most recent release 0.9.10 now takes an optional CompileOptions argument where annotation processors can be registered. This means, we can now write a very simple unit test as follows (and if you’re using Java 15+, you can profit from text blocks! For a Java 11 compatible version without text blocks, see our unit tests on github):

@Test
public void testCompileWithAnnotationProcessors() {
    AProcessor p = new AProcessor();

    try {
        Reflect.compile(
            "org.joor.test.FailAnnotationProcessing",
            """
             package org.joor.test; 
             @A 
             public class FailAnnotationProcessing {
             }
            """,
            new CompileOptions().processors(p)
        ).create().get();
        Assert.fail();
    }
    catch (ReflectException expected) {
        assertTrue(p.processed);
    }

    Reflect.compile(
        "org.joor.test.SucceedAnnotationProcessing",
        """
         package org.joor.test; 
         @A @B 
         public class SucceedAnnotationProcessing {
         }
        """,
        new CompileOptions().processors(p)
    ).create().get();
    assertTrue(p.processed);
}

So easy! Never have regressions in your annotation processors again!