Java’s Checked Exceptions Are Just Weird Union Types

This fun fact has been on my mind for a while, and a recent reddit thread about “Smuggling Checked Exceptions with Sealed Interfaces” made me write this post here. Namely, Java had union types before it was cool! (If you squint hard).

What are union types?

Ceylon is an underrated JVM language that never really took off, which is too bad, because the concepts it introduced are very elegant (see e.g. how they implemented nullable types as syntax sugar on top of union types, which IMO is much better than anything monadic using Option types or kotlin’s ad-hoc type system extension).

So, one of those concepts are union types. One of the most popular language that supports them, currently, is TypeScript, though C++, PHP, and Python also have something similar. (The fact whether the union type is tagged or not isn’t relevant to this post).

If you understand Java’s intersection types A & B (meaning that something is both a subtype of A and of B), then it’s easy to understand union types A | B (meaning that something is a subtype of any of A or B). TypeScript shows a simple example of this

function printId(id: number | string) {
  console.log("Your ID is: " + id);
}
// OK
printId(101);
// OK
printId("202");
// Error
printId({ myID: 22342 });

Structural vs nominal typing

Such a union type (or intersection type) is a structural type, as opposed to what we’ve been doing in Java via nominal types, where you have to declare a named type for this union every time you want to use it. E.g. in jOOQ, we have things like:

interface FieldOrRow {}
interface Field<T> extends FieldOrRow {}
interface Row extends FieldOrRow {}

Very soon, the Java 17 distribution will seal the above type hierarchy as follows (incomplete, subtypes of Field<T> and Row are omitted for brevity):

sealed interface FieldOrRow permits Field<T>, Row {}
sealed interface Field<T> extends FieldOrRow permits ... {}
sealed interface Row extends FieldOrRow permits ... {}

There are pros and cons for doing these things structurally or nominally:

Structural typing:

  • Pro: You can create any ad-hoc union of any set of types anywhere
  • Pro: You don’t have to change the existing type hierarchy, which is essential when you don’t have access to it, e.g. when you want to do something like the above number | string type. This kinda works like JSR 308 type annotations that were introduced in Java 8.

Nominal typing:

  • Pro: You can attach documentation to the type, and reuse it formally (rather than structurally). TypeScript and many other languages offer type aliases for this kind of stuff, so you can have a bit of both worlds, though the alias is erased, meaning you keep the complex structural type leading to curious error messages.
  • Pro: You can seal the type hierarchy to allow for exhaustiveness checking among subtypes (e.g. above, there can only be Field<T> or Row subtypes of FieldOrRow. A structurally typed union type is implicitly “sealed” ad-hoc by the union type description (not sure if that’s how it’s called), but with nominal types, you can make sure no one else can extend the type hierarchy, (except where you permit it explicitly using the non-sealed keyword)

Ultimately, as ever so often, things like structural and nominal typing are two sides of the same coin, pros and cons mostly depending on taste and on how much you control a code base.

So, how are checked exceptions union types?

When you declare a method that throws checked exceptions, the return type of the method is really such a union type. Look at this example in Java:

public String getTitle(int id) throws SQLException;

The call-site now has to “check” the result of this method call using try-catch, or declare re-throwing the checked exception(s):

try {
    String title = getTitle(1);
    doSomethingWith(title);
}
catch (SQLException e) {
    handle(e);
}

If early Java had union types rather than checked exceptions, we might have declared this as follows, instead:

public String|SQLException getTitle(int id);

Likewise, a caller of this method will have to “check” the result of this method call. There’s no simple way of re-throwing it, so if we do want to re-throw, we’d need some syntax sugar, or repeat the same code all the time, Go-style:

// Hypothetical Java syntax:
String|SQLException result = getTitle(1);

switch (result) {
    case String title -> doSomethingWith(title);
    case SQLException e -> handle(e);
}

It would be obvious how such a JEP 406 style switch pattern matching statement or expression could implement an exhaustiveness check, just like with the existing JEP 409 sealed classes approach, the only difference, again, being that everything is now structurally typed, rather than nominally typed.

In fact, if you declare multiple checked exceptions, such as the JDK’s reflection API:

public Object invoke(Object obj, Object... args)
throws 
    IllegalAccessException, 
    IllegalArgumentException,
    InvocationTargetException

With union types, this would just be this, instead:

// Hypothetical Java syntax:
public Object
    | IllegalAccessException
    | IllegalArgumentException
    | InvocationTargetException invoke(Object obj, Object... args)

And the union type syntax from the catch block, which checks for exhaustiveness (yes, we have union types in catch!)…

try {
    Object returnValue = method.invoke(obj);
    doSomethingWith(returnValue);
}
catch (IllegalAccessException | IllegalArgumentException e) {
    handle1(e);
}
catch (InvocationTargetException e) {
    handle2(e);
}

Could still check for exhaustiveness with the switch pattern matching approach:

// Hypothetical Java syntax:
Object
    | IllegalAccessException
    | IllegalArgumentException
    | InvocationTargetException result = method.invoke(obj);

switch (result) {
    case IllegalAccessException, 
         IllegalArgumentException e -> handle1(e);
    case InvocationTargetException e -> handle2(e);
    case Object returnValue = doSomethingWith(returnValue);
}

A subtle caveat here is that exceptions are subtypes of Object, so we must put that case at the end, as it “dominates” the others (see JEP 406 for a discussion about dominance). Again, we can prove exhaustiveness, because all types that are involved in the union type have a switch case.

Can we emulate union types with checked exceptions?

You know what Jeff Goldblum would say

But this blog is known to do it anyway. Assuming that for every possible type, we had a synthetic (code generated?) checked exception that wraps it (because in Java, exceptions are not allowed to be generic):

// Use some "protective" base class, so no one can introduce 
// RuntimeExceptions to the type hierarchy
class E extends Exception {

    // Just in case you're doing this in performance sensitive code...
    @Override
    public Throwable fillInStackTrace() {
        return this;
    }
}

// Now create a wrapper exception for every type you want to represent
class EString extends E {
    String s;
    EString(String s) {
        this.s = s;
    }
}
class Eint extends E {
    int i;
    Eint(int i) {
        this.i = i;
    }
}

The benefit of this is we don’t have to wait for Valhalla to support primitive types in generics, nor to reify them. We’ve already emulated that as you can see above.

Next, we need a switch emulation for arbitrary degrees (22 will probably be enough?). Here’s one for degree 2:

// Create an arbitrary number of switch utilities for each arity up 
// to, say 22 as is best practice
class Switch2<E1 extends E, E2 extends E> {
    E1 e1;
    E2 e2;

    private Switch2(E1 e1, E2 e2) {
        this.e1 = e1;
        this.e2 = e2;
    }

    static <E1 extends E, E2 extends E> Switch2<E1, E2> of1(E1 e1) {
        return new Switch2<>(e1, null);
    }

    static <E1 extends E, E2 extends E> Switch2<E1, E2> of2(E2 e2) {
        return new Switch2<>(null, e2);
    }

    void check() throws E1, E2 {
        if (e1 != null)
            throw e1;
        else
            throw e2;
    }
}

And finally, here’s how we can emulate our exhaustiveness checking switch with catch blocks!

// "Union type" emulating String|int
Switch2<EString, Eint> s = Switch2.of1(new EString("hello"));

// Doesn't compile, Eint isn't caught (catches aren't exhaustive)
try {
    s.check();
}
catch (EString e) {}

// Compiles fine
try {
    s.check();
}
catch (EString e) {}
catch (Eint e) {}

// Also compiles fine
try {
    s.check();
}
catch (EString | Eint e) {}

// Doesn't compile because Eint "partially dominates" EString | Eint
try {
    s.check();
}
catch (Eint e) {}
catch (EString | Eint e) {}

“Neat”, huh? We could even imagine destructuring within the catch block, such that we can automatically unwrap the value from the auxiliary “E” type.

Since we already have “union types” in Java (in catch blocks), and since checked exception declarations could be retrofitted to form a union type with the method’s actual return type, my hopes are still that in some distant future, a more powerful Java will be available where these “union types” (and also intersection types) will be made first class. APIs like jOOQ would greatly profit from this!

Write C-Style Local Static Variables in Java 16

Java 16 includes an improvement that makes the language a bit more regular via JEP 395. The JEP says:

Static members of inner classes

It is currently specified to be a compile-time error if an inner class declares a member that is explicitly or implicitly static, unless the member is a constant variable. This means that, for example, an inner class cannot declare a record class member, since nested record classes are implicitly static.

We relax this restriction in order to allow an inner class to declare members that are either explicitly or implicitly static. In particular, this allows an inner class to declare a static member that is a record class.

What sounds like a minor necessary evil to make a new feature (record classes) more versatile actually has a life of its own. We can use it to emulate C-style local static variables, i.e. local variables that are:

  • Initialised only once (and lazily at that)
  • Shared among multiple executions of a method

This sounds like a rather hairy feature, global variables that are visible only locally. But in fact, it’s something I’ve wanted for a long time, especially when I wanted to cache regular expression patterns without polluting the class namespace.

Consider this code:

package p;

import java.util.regex.Pattern;

public class Test {
    public static void main(String[] args) {
        check("a");
        check("b");
    }
    
    static Pattern compile(String pattern) {
        System.out.println("compile(" + pattern + ")");
        return Pattern.compile(pattern);
    }
    
    static void check(String string) {
        // Re-compiling the pattern every time: Bad
        // Keeping the pattern local to the method: Good
        System.out.println("check(" + string + "): " 
            + compile("a").matcher(string).find());
    }
}

It prints:

compile(a)
check(a): true
compile(a)
check(b): false

Compiling a pattern can be costly if done frequently, so we better cache it. We used to do that like this:

package p;

import java.util.regex.Pattern;

public class Test {
    public static void main(String[] args) {
        check("a");
        check("b");
    }
    
    static Pattern compile(String pattern) {
        System.out.println("compile(" + pattern + ")");
        return Pattern.compile(pattern);
    }
    
    // Compiling the pattern only once: Good
    // Placing the pattern in a class namespace: Bad
	static final Pattern P_CHECK = compile("a");
	
    static void check(String string) {
        System.out.println("check(" + string + "): " 
            + P_CHECK.matcher(string).find());
    }
}

This now prints a more optimal output:

compile(a)
check(a): true
check(b): false

I.e. the regular expression pattern is compiled only once. But unfortunately, we had to pollute the entire class’s namespace, which can quickly become cumbersome if we have dozens of such regular expressions. Can we scope the P_CHECK variable to the check() method only? We now can!

package p;

import java.util.regex.Pattern;

public class Test {
    public static void main(String[] args) {
        check("a");
        check("b");
    }
    
    static Pattern compile(String pattern) {
        System.out.println("compile(" + pattern + ")");
        return Pattern.compile(pattern);
    }
    
    static void check(String string) {

        // Compiling the pattern only once: Good
        // Keeping the pattern local to the method: Good
        // Capturing scope: Egh...
        var patterns = new Object() { 
            static final Pattern P_CHECK = compile("a");
        };
        
        System.out.println("check(" + string + "): " 
            + patterns.P_CHECK.matcher(string).find());
    }
}

This again prints the desired, optimal output:

compile(a)
check(a): true
check(b): false

The combination of using var to use a non-denotable type (whose members we can dereference) along with the ability of putting static members in inner classes effectively emulates local static variables, just like in C!

Since the inner class is unlikely to escape its scope, the fact that it may be capturing scope isn’t that big of a risk as illustrated previously in a criticism of the double curly brace anti pattern. You’re still creating an extra class and a use-less object that escape analysis hopefully prevents from allocating, so it’s not exactly a very clean solution, but nice to know this is now possible.

Could we Have a Language That Hides Collections From Us?

I just fixed a bug. The fix required me to initialise an Object[] array with the init values for each type, instead of just null, i.e. false for boolean, 0 for int, 0.0 for double, etc. So, instead of just doing:

Object[] converted = new Object[parameterTypes.length];

I needed:

Object[] converted = new Object[parameterTypes.length];

for (int i = 0; i < converted.length; i++)
    converted[i] = Reflect.initValue(parameterTypes[i]);

For the subjective 8E17th time, I wrote a loop. A loop that did nothing interesting other than call a method for each of the looped structure’s elements. And I felt the pain of our friend Murtaugh

Why do we distinguish between T and T[]?

What I really wanted to do is this. I have a method Reflect.initValue()

public static <T> T initValue(Class<T> type) {}

What I really want to do is this, in one way or another:

converted = initValue(parameterTypes);

(Yes, there are subtleties that need to be thought about, such as should this init an array or assign values to an array. Forget about them for now. Big picture first). The point is, no one enjoys writing loops. No one enjoys writing map/flatMap either:

Stream.of(parameterTypes)
      .map(Reflect::initValue)
      .toArray(converted);

It’s so much useless, repetitive, infrastructural ceremony that I don’t enjoy writing nor reading. My “business logic” here is simply

converted = initValue(parameterTypes);

I have 3 elements:
  • A source data structure parameterTypes
  • A target data structure converted
  • A mapping function initValue
That’s all I should be seeing in my code. All the infrastructure of how to iterate is completely meaningless and boring.

SQL joins

In fact, SQL joins are often the same. We use primary key / foreign key relationships, so the path between parent and child tables is very obvious in most cases. Joins are cool, relational algebra is cool, but in most cases, it just gets in the way of writing understandable business logic. In my opinion, this is one of Hibernate’s biggest innovations (probably others did this too, perhaps even before Hibernate): implicit joins, which jOOQ copied. There’s much ceremony in writing this:

SELECT
  cu.first_name,
  cu.last_name,
  co.country
FROM customer AS cu
JOIN address USING (address_id)
JOIN city USING (city_id)
JOIN country AS co USING (country_id)

When this alternative, intuitive syntax would be much more convenient:

SELECT
  cu.first_name,
  cu.last_name,
  cu.address.city.country.country
FROM customer AS cu

It is immediately clear what is meant by the implicit join syntax. The syntactic ceremony of writing the explicit joins is not necessary. Again, joins are really cool, and power users will be able to use them when needed. E.g. the occasional NATURAL FULL OUTER JOIN can still be done! But let’s admit it, 80% of all joins are boring, and could be replaced with the above syntax sugar.

Suggestion for Java

Of course, this suggestion will not be perfect, because it doesn’t deal with the gazillion edge cases of introducing such a significant feature to an old language. But again, if we allow ourselves to focus on the big picture, wouldn’t it be nice if we could:

class Author {
  String firstName;
  String lastName;
  Book[] books; // Or use any collection type here
}

class Book {
  String title;
}

And then:

Author[] authors = ...

// This...
String[] firstNames = authors.firstName;

// ...is sugar for this (oh how it hurts to type this):
String[] firstNames = new String[authors.length];
for (int i = 0; i < firstNames.length; i++)
    firstNames[i] = authors[i].firstName;

// And this...
int[] firstNameLengths = authors.firstName.length()

// ... is sugar for this:
int[] firstNameLengths = new int[authors.length];
for (int i = 0; i < firstNames.length; i++)
    firstNames[i] = authors[i].firstName.length();

// ... or even this, who cares (hurts to type even more):
int[] firstNameLengths = Stream
  .of(authors)
  .map(a -> a.firstName)
  .mapToInt(String::length)
  .toArray();

Ignore the usage of arrays, it could just as well be a List, Stream, Iterable, whatever data structure or syntax that allows to get from a 1 arity to an N arity. Or to get a set of author’s books:

Author[] authors = ...
Book[] books = authors.books;

Could it mean anything other than that:

Stream.of(authors)
      .flatMap(a -> Stream.of(a.books))
      .toArray(Book[]::new);

Why do we have to keep spelling these things out? They’re not business logic, they’re meaningless, boring, infrastructure. While yes, there are surely many edge cases (and we could live with the occasional compiler errors, if the compiler can’t figure out how to get from A to B), there are also many “very obvious” cases, where the cerimonial mapping logic (imperative or functional, doesn’t matter) is just completely obvious and boring. But it gets in the way of writing and reading, and despite the fact that it seems obvious in many cases, it is still error prone! I think it’s time to revisit the ideas behind APL, where everything is an array, and by consequence, operations on arity 1 types can be applied to arity N types just the same, because the distinction is often not very useful.

Bonus: Null

While difficult to imagine retrofitting a language like Java with this, a new language could do away with nulls forever, because the arity 0-1 is just a special case of the arity N: An empty array. Looking forward to your thoughts.

A Quick Trick to Make a Java Stream Construction Lazy

One of the Stream APIs greatest features is its laziness. The whole pipeline is constructed lazily, stored as a set of instructions, akin to a SQL execution plan. Only when we invoke a terminal operation, the pipeline is started. It is still lazy, meaning that some operations may be short circuited. Some third party libraries produce streams that are not entirely lazy. For example, jOOQ until version 3.12 eagerly executed a SQL query when calling ResultQuery.stream(), regardless if the Stream is consumed afterwards:

try (var stream = ctx.select(T.A, T.B).from(T).stream()) {
    // Not consuming the stream here
}

While this is probably a bug in client code, not executing the statement in this case might still be a useful feature. The exception being, of course, if the query contains a FOR UPDATE clause, in case of which the user probably uses Query.execute() instead, if they don’t care about the result. A more interesting example where laziness helps is the fact that we might not want this query to be executed right away, as are perhaps still on the wrong thread to execute it. Or we would like any possible exceptions to be thrown from wherever the result is consumed, i.e. where the terminal operation is called. For example:

try (var stream = ctx.select(T.A, T.B).from(T).stream()) {
    consumeElsewhere(stream);
}

And then:

public void consumeElsewhere(Stream<? extends Record> stream) {
    runOnSomeOtherThread(() -> {
        stream.map(r -> someMapping(r))
              .forEach(r -> someConsumer(r));
    });
}

While we’re fixing this in jOOQ 3.13 (https://github.com/jOOQ/jOOQ/issues/4934), you may be stuck to an older version of jOOQ, or have another library do the same thing. Luckily, there’s an easy trick to quickly make a third party stream “lazy”. Flatmap it! Just write this instead:

try (var stream = Stream.of(1).flatMap(
    i -> ctx.select(T.A, T.B).from(T).stream()
)) {
    consumeElsewhere(stream);
}

The following small test illustrates that the stream() is now being constructed lazily

public class LazyStream {

    @Test(expected = RuntimeException.class)
    public void testEager() {
        Stream<String> stream = stream();
    }

    @Test
    public void testLazyNoTerminalOp() {
        Stream<String> stream = Stream.of(1).flatMap(i -> stream());
    }

    @Test(expected = RuntimeException.class)
    public void testLazyTerminalOp() {
        Optional<String> result = stream().findAny();
    }

    public Stream<String> stream() {
        String[] array = { "heavy", "array", "creation" };

        // Some Resource Problem that might occur
        if (true)
            throw new RuntimeException();

        return Stream.of(array);
    }
}

Caveat

Depending on the JDK version you’re using, the above approach has its own significant problems. For example, in older versions of the JDK 8, flatMap() itself might not be lazy at all! More recent versions of the JDK have fixed that problem, including JDK 8u222: https://bugs.openjdk.java.net/browse/JDK-8225328

How to Write a Simple, yet Extensible API

How to write a simple API is already an art on its own.
I didn’t have time to write a short letter, so I wrote a long one instead. ― Mark Twain
But keeping an API simple for beginners and most users, and making it extensible for power users seems even more of a challenge. But is it?

What does “extensible” mean?

Imagine an API like, oh say, jOOQ. In jOOQ, you can write SQL predicates like this:

ctx.select(T.A, T.B)
   .from(T)
   .where(T.C.eq(1)) // Predicate with bind value here
   .fetch();

By default (as this should always be the default), jOOQ will generate and execute this SQL statement on your JDBC driver, using a bind variable:

SELECT t.a, t.b
FROM t
WHERE t.c = ?

The API made the most common use case simple. Just pass your bind variable as if the statement was written in e.g. PL/SQL, and let the language / API do the rest. So we passed that test. The use case for power users is to occasionally not use bind variables, for whatever reasons (e.g. skew in data and bad statistics, see also this post about bind variables).Will we pass that test as well? jOOQ mainly offers two ways to fix this: On a per-query basis You can turn your variable into an inline value explicitly for this single occasion:

ctx.select(T.A, T.B)
   .from(T)
   .where(T.C.eq(inline(1))) // Predicate without bind value here
   .fetch();

This is using the static imported DSL.inline() method. Works, but not very convenient, if you have to do this for several queries, for several bind values, or worse, depending on some context. This is a necessary API enhancement, but it does not make the API extensible. On a global basis Notice that ctx object there? It is the DSLContext object, the “contextual DSL”, i.e. the DSL API that is in the context of a jOOQ Configuration. You can thus set:

ctx2 = DSL.using(ctx
    .configuration()
    .derive()
    .set(new Settings()
    .withStatementType(StatementType.STATIC_STATEMENT));

// And now use this new DSLContext instead of the old one
ctx2.select(T.A, T.B)
    .from(T)
    .where(T.C.eq(1)) // No longer a bind variable
    .fetch();

Different approaches to offering such extensibility

We have our clean and simple API. Now some user wants to extend it. So often, we’re tempted to resort to a hack, e.g. by using thread locals, because they would work easily when under the assumption of a thread-bound execution model – such as e.g. classic Java EE Servlets
The price we’re paying for such a hack is high.
  1. It’s a hack, and as such it will break easily. If we offer this as functionality to a user, they will start depending on it, and we will have to support and maintain it
  2. It’s a hack, and it is based on assumptions, such as thread bound ness. It will not work in an async / reactive / parallel stream context, where our logic may jump back and forth between threads
  3. It’s a hack, and deep inside, we know it’s wrong. Obligatory XKCD: https://xkcd.com/292
This might obviously work, just like global (static) variables. You can set this variable globally (or “globally” for your own thread), and then the API’s internals will be able to read it. No need to pass around parameters, so no need to compromise on the APIs simplicity by adding optional and often ugly, distractive parameters. What are better approaches to offering such extensibility? Dependency Injection One way is to use explicit Dependency Injection (DI). If you have a container like Spring, you can rely on Spring injecting arbitrary objects into your method call / whatever, where you need access to it:
This way, if you maintain several contextual objects of different lifecycle scopes, you can let the DI framework make appropriate decisions to figure out where to get that contextual information from. For example, when using JAX-RS, you can do this using an annotation based approach:


// These annotations bind the method to some HTTP address
@GET
@Produces("text/plain")
@Path("/api")
public String method(

    // This annotation fetches a request-scoped object
    // from the method call's context
    @Context HttpServletRequest request,

    // This annotation produces an argument from the
    // URL's query parameters
    @QueryParam("arg") String arg
) {
    ...
}

This approach works quite nicely for static environments (annotations being static), where you do not want to react to dynamic URLs or endpoints. It is declarative, and a bit magic, but well designed, so once you know all the options, you can choose the right one for your use case very easily. While @QueryParam is mere convenience (you could have gotten the argument also from the HttpServletRequest), the @Context is powerful. It can help inject values of arbitrary lifecycle scope into your method / class / etc. I personally favour explicit programming over annotation-based magic (e.g. using Guice for DI), but that’s probably a matter of taste. Both are a great way for implementors of APIs (e.g. HTTP APIs) to help get access to framework objects. However, if you’re an API vendor, and want to give users of your API a way to extend the API, I personally favour jOOQ’s SPI approach. SPIs One of jOOQ’s strengths, IMO, is precisely this single, central place to register all SPI implementations that can be used for all sorts of purposes: The Configuration. For example, on such a Configuration you can specify a JSR-310 java.time.Clock. This clock will be used by jOOQ’s internals to produce client side timestamps, instead of e.g. using System.currentTimeMillis(). Definitely a use case for power users only, but once you have this use case, you really only want to tweak a single place in jOOQ’s API: The Configuration. All of jOOQ’s internals will always have a Configuration reference available. And it’s up to the user to decide what the scope of this object is, jOOQ doesn’t care. E.g.
  • per query
  • per thread
  • per request
  • per session
  • per application
In other words, to jOOQ, it doesn’t matter at all if you’re implementing a thread-bound, blocking, classic servlet model, or if you’re running your code reactively, or in parallel, or whatever. Just manage your own Configuration lifecycle, jOOQ doesn’t care. In fact, you can have a global, singleton Configuration and implement thread bound components of it, e.g. the ConnectionProvider SPI, which takes care of managing the JDBC Connection lifecycle for jOOQ. Typically, users will use e.g. a Spring DataSource, which manages JDBC Connection (and transactions) using a thread-bound model, internally using ThreadLocal. jOOQ does not care. The SPI specifies that jOOQ will: Again, it does not matter to jOOQ what the specific ConnectionProvider implementation does. You can implement it in any way you want if you’re a power user. By default, you’ll just pass jOOQ a DataSource, and it will wrap it in a default implementation called DataSourceConnectionProvider for you. The key here is again:
  • The API is simple by default, i.e. by default, you don’t have to know about this functionality, just pass jOOQ a DataSource as always when working with Java and SQL, and you’re ready to go
  • The SPI allows for easily extending the API without compromising on its simplicity, by providing a single, central access point to this kind of functionality
Other SPIs in Configuration include:
  • ExecuteListener: An extremely useful and simple way to hook into the entire jOOQ query management lifecycle, from generating the SQL string to preparing the JDBC statement, to binding variables, to execution, to fetching result sets. A single SPI can accomodate various use cases like SQL logging, patching SQL strings, patching JDBC statements, listening to result set events, etc.
  • ExecutorProvider: Whenever jOOQ runs something asynchronously, it will ask this SPI to provide a standard JDK Executor, which will be used to run the asynchronous code block. By default, this will be the JDK default (the default ForkJoinPool), as always. But you probably want to override this default, and you want to be in full control of this, and not think about it every single time you run a query.
  • MetaProvider: Whenever jOOQ needs to look up database meta information (schemas, tables, columns, types, etc.), it will ask this MetaProvider about the available meta information. By default, this will run queries on the JDBC DatabaseMetaData, which is good enough, but maybe you want to wire these calls to your jOOQ-generated classes, or something else.
  • RecordMapperProvider and RecordUnmapperProvider: jOOQ has a quite versatile default implementation of how to map between a jOOQ Record and an arbitrary Java class, supporting a variety of standard approaches including JavaBeans getter/setter naming conventions, JavaBeans @ConstructorProperties, and much more. These defaults apply e.g. when writing query.fetchInto(MyBean.class). But sometimes, the defaults are not good enough, and you want this particular mapping to work differently. Sure, you could write query.fetchInto(record -> mymapper(record)), but you may not want to remember this for every single query. Just override the mapper (and unmapper) at a single, central spot for your own chosen Configuration scope (e.g. per query, per request, per session, etc.) and you’re done

Conclusion

Writing a simple API is difficult. Making it extensible in a simple way, however, is not. If your API has achieved “simplicity”, then it is very easy to support injecting arbitrary SPIs for arbitrary purposes at a single, central location, such as jOOQ’s Configuration. In my most recent talk “10 Reasons Why we Love Some APIs and Why we Hate Some Others”, I’ve made a point that things like simplicity, discoverability, consistency, and convenience are among the most important aspects of a great API. How do you define a good API? The most underrated answer on this (obviously closed) Stack Overflow question is this one: . Again, this is hard in terms of creating a simple API. But it is extremely easy when making this simple API extensible. Make your SPIs very easily discoverable. A jOOQ power user will always look for extension points in jOOQ’s Configuration. And because the extension points are explicit types which have to be implemented (as opposed to annotations and their magic), no documentation is needed to learn the SPI (of course it is still beneficial as a reference). I’d love to hear your alternative approaches to this API design challenge in the comments. Watch the full talk here:

How to Unit Test Your Annotation Processor using jOOR

Annotation processors can be useful as a hacky workaround to get some language feature into the Java language. jOOQ also has an annotation processor that helps validate SQL syntax for:
  • Plain SQL usage (SQL injection risk)
  • SQL dialect support (prevent using an Oracle only feature on MySQL)
You can read about it more in detail here.

Unit testing annotation processors

Unit testing annotation processors is a bit more tricky than using them. Your processor hooks into the Java compiler and manipulates the compiled AST (or does other things). If you want to test your own processor, you need the test to run a Java compiler, but that is difficult to do in a normal project setup, especially if the expected behaviour for a given test is a compilation error. Let’s assume we have the following two annotations:

@interface A {}
@interface B {}

And now, we would like to establish a rule that @A must always be accompanied by @B. For example:

// This must not compile
@A
class Bad {}

// This is fine
@A @B
class Good {}

We’ll enforce that with an annotation processor:

class AProcessor implements Processor {
    boolean processed;
    private ProcessingEnvironment processingEnv;

    @Override
    public Set<String> getSupportedOptions() {
        return Collections.emptySet();
    }

    @Override
    public Set<String> getSupportedAnnotationTypes() {
        return Collections.singleton("*");
    }

    @Override
    public SourceVersion getSupportedSourceVersion() {
        return SourceVersion.RELEASE_8;
    }

    @Override
    public void init(ProcessingEnvironment processingEnv) {
        this.processingEnv = processingEnv;
    }

    @Override
    public boolean process(Set<? extends TypeElement> annotations, RoundEnvironment roundEnv) {
        for (TypeElement e1 : annotations)
            if (e1.getQualifiedName().contentEquals(A.class.getName()))
                for (Element e2 : roundEnv.getElementsAnnotatedWith(e1))
                    if (e2.getAnnotation(B.class) == null)
                        processingEnv.getMessager().printMessage(ERROR, "Annotation A must be accompanied by annotation B");

        this.processed = true;
        return false;
    }

    @Override
    public Iterable<? extends Completion> getCompletions(Element element, AnnotationMirror annotation, ExecutableElement member, String userText) {
        return Collections.emptyList();
    }
}

Now, this works. We can easily verify that manually by adding the annotation processor to some Maven compiler configuration and by annotating a few classes with A and B. But then, someone changes the code and we don’t notice the regression. How can we unit test this, rather than doing things manually?

jOOR 0.9.10 support for annotation processors

jOOR is our little open source reflection library that we’re using internally in jOOQ jOOR has a convenient API to invoke the javax.tools.JavaCompiler API through Reflect.compile(). The most recent release 0.9.10 now takes an optional CompileOptions argument where annotation processors can be registered. This means, we can now write a very simple unit test as follows (and if you’re using Java 15+, you can profit from text blocks! For a Java 11 compatible version without text blocks, see our unit tests on github):

@Test
public void testCompileWithAnnotationProcessors() {
    AProcessor p = new AProcessor();

    try {
        Reflect.compile(
            "org.joor.test.FailAnnotationProcessing",
            """
             package org.joor.test; 
             @A 
             public class FailAnnotationProcessing {
             }
            """,
            new CompileOptions().processors(p)
        ).create().get();
        Assert.fail();
    }
    catch (ReflectException expected) {
        assertTrue(p.processed);
    }

    Reflect.compile(
        "org.joor.test.SucceedAnnotationProcessing",
        """
         package org.joor.test; 
         @A @B 
         public class SucceedAnnotationProcessing {
         }
        """,
        new CompileOptions().processors(p)
    ).create().get();
    assertTrue(p.processed);
}

So easy! Never have regressions in your annotation processors again!

How to Create a Good MCVE (Minimal Complete Verifiable Example)

Reporting a bug takes time, and trust me, every vendor appreciates your reporting of a bug! Your voice counts as many voices, for all the other customers of a product who do not want to or cannot take the time to report the same bug are numerous.

So, first off, thanks for taking that time and reaching out to us vendors. We really appreciate your help!

Having said so, reporting a bug can be a tedious exercise. For both parties, the one reporting the bug and the one receiving it. There are extremely simple bugs, such as typos in documentation. They can be easily pointed to and just as easily be fixed. There are much trickier bugs, such as concurrency issues in complicated project setups. They take time to reproduce. This is why an MCVE (Minimal Complete Verifiable Example) is so useful. The linked stack overflow page explains why it is so useful to answer questions. But the same arguments apply when reporting a bug.

And that’s where the tricky part starts. It isn’t easy to create an example that is:

  • Minimal: Your real world application code is huge. You cannot dump the entirety of it to the vendor for various reasons. And the vendor cannot look through it all to try to reproduce it. So, the problem has to be isolated into an example of minimal scope, with no unnecessary additional functionality. That’s hard too, because your project has been set up months or years ago. You don’t want to spend too much time setting up a new project
  • Complete: When reducing the problem to a minimal one, we’re tempted to just describe it in prose. But that can be difficult as well, because prose is hardly complete. It’s difficult to describe a problem when it would be quite easy to show the code. But that brings us back to the minimal part. We want to show only the relevant code, not all of it.
  • Verifiable: Ultimately, the ideal example can be used by the vendor to reproduce the problem, because once that’s possible, the vendor can start debugging it and finding the right spots to fix quite easily. Otherwise, it’s just guessing and going back and forth with the reporter, just to write more prose. That’s tiring on both sides.

This is why we now have an example project on GitHub to help you create that MCVE:

https://github.com/jOOQ/jOOQ-mcve

It is a minimal example that uses:

This example can be forked on GitHub and modified by you directly, in order to show how to reproduce your issue. In the future, we’ll add more example setups that may be helpful to reproduce your specific issue.

Thanks again for taking the time to report issues. We vendors really appreciate your work!

Imperative Loop or Functional Stream Pipeline? Beware of the Performance Impact!

I like weird, yet concise language constructs and API usages
Yes. I am guilty. Evil? Don’t know. But guilty. I heavily use and abuse the java.lang.Boolean type to implement three valued logic in Java:
  • Boolean.TRUE means true (duh)
  • Boolean.FALSE means false
  • null can mean anything like “unknown” or “uninitialised”, etc.
I know – a lot of enterprise developers will bikeshed and cargo cult the old saying:
Code is read more often than it is written
But as with everything, there is a tradeoff. For instance, in algorithm-heavy, micro optimised library code, it is usually more important to have code that really performs well, rather than code that apparently doesn’t need comments because the author has written it in such a clear and beautiful way. I don’t think it matters much in the case of the boolean type (where I’m just too lazy to encode every three valued situation in an enum). But here’s a more interesting example from that same twitter thread. The code is simple:

woot:
if (something) {
  for (Object o : list) 
    if (something(o))
      break woot;

  throw new E();
}

Yes. You can break out of “labeled ifs”. Because in Java, any statement can be labeled, and if the statement is a compound statement (observe the curly braces following the if), then it may make sense to break out of it. Even if you’ve never seen that idiom, I think it’s quite immediately clear what it does. Ghasp! If Java were a bit more classic, it might have supported this syntax:

if (something) {
  for (Object o : list) 
    if (something(o))
      goto woot;

  throw new E();
}
woot:

Nicolai suggested that the main reason I hadn’t written the following, equivalent, and arguably more elegant logic, is because jOOQ still supports Java 6:

if (something && list.stream().noneMatch(this::something))
  throw new E();

It’s more concise! So, it’s better, right? Everything new is always better. A third option would have been the less concise solution that essentially just replaces break by return:

if (something && noneMatchSomething(list)
  throw new E();

// And then:
private boolean noneMatchSomething(List<?> list) {
  for (Object o : list)
    if (something(o))
      return false;
  return true;
}

There’s an otherwise useless method that has been extracted. The main benefit is that people are not used to breaking out of labeled statements (other than loops, and even then it’s rare), so this is again about some subjective “readability”. I personally find this particular example less readable, because the extracted method is no longer local. I have to jump around in the class and interrupt my train of thoughts. But of course, YMMV with respect to the two imperative alternatives.

Back to objectivity: Performance

When I tweet about Java these days, I’m mostly tweeting about my experience writing jOOQ. A library. A library that has been tuned so much over the past years, that the big client side bottleneck (apart from the obvious database call) is the internal StringBuilder that is used to generate dynamic SQL. And compared to most database queries, you will not even notice that. But sometimes you do. E.g. if you’re using an in-memory H2 database and run some rather trivial queries, then jOOQ’s overhead can become measurable again. Yes. There are some use-cases, which I do want to take seriously as well, where the difference between an imperative loop and a stream pipeline is measurable. In the above examples, let’s remove the throw statement and replace it by something simpler (because exceptions have their own significant overhead). I’ve created this JMH benchmark, which compares the 3 approaches:
  • Imperative with break
  • Imperative with return
  • Stream
Here’s the benchmark

package org.jooq.test.benchmark;

import java.util.ArrayList;
import java.util.List;

import org.openjdk.jmh.annotations.*;

@Fork(value = 3, jvmArgsAppend = "-Djmh.stack.lines=3")
@Warmup(iterations = 5, time = 3)
@Measurement(iterations = 7, time = 3)
public class ImperativeVsStream {

    @State(Scope.Benchmark)
    public static class BenchmarkState {

        boolean something = true;

        @Param({ "2", "8" })
        int listSize;

        List<Integer> list = new ArrayList<>();

        boolean something() {
            return something;
        }

        boolean something(Integer o) {
            return o > 2;
        }

        @Setup(Level.Trial)
        public void setup() throws Exception {
            for (int i = 0; i < listSize; i++)
                list.add(i);
        }

        @TearDown(Level.Trial)
        public void teardown() throws Exception {
            list = null;
        }
    }

    @Benchmark
    public Object testImperativeWithBreak(BenchmarkState state) {
        woot:
        if (state.something()) {
            for (Integer o : state.list)
                if (state.something(o))
                    break woot;

            return 1;
        }

        return 0;
    }

    @Benchmark
    public Object testImperativeWithReturn(BenchmarkState state) {
        if (state.something() && woot(state))
            return 1;

        return 0;
    }

    private boolean woot(BenchmarkState state) {
        for (Integer o : state.list)
            if (state.something(o))
                return false;

        return true;
    }

    @Benchmark
    public Object testStreamNoneMatch(BenchmarkState state) {
        if (state.something() && state.list.stream().noneMatch(state::something))
            return 1;

        return 0;
    }

    @Benchmark
    public Object testStreamAnyMatch(BenchmarkState state) {
        if (state.something() && !state.list.stream().anyMatch(state::something))
            return 1;

        return 0;
    }

    @Benchmark
    public Object testStreamAllMatch(BenchmarkState state) {
        if (state.something() && state.list.stream().allMatch(s -> !state.something(s)))
            return 1;

        return 0;
    }
}

The results are pretty clear:
Benchmark                                    (listSize)   Mode  Cnt         Score          Error  Units
ImperativeVsStream.testImperativeWithBreak            2  thrpt   14  86513288.062 ± 11950020.875  ops/s
ImperativeVsStream.testImperativeWithBreak            8  thrpt   14  74147172.906 ± 10089521.354  ops/s
ImperativeVsStream.testImperativeWithReturn           2  thrpt   14  97740974.281 ± 14593214.683  ops/s
ImperativeVsStream.testImperativeWithReturn           8  thrpt   14  81457864.875 ±  7376337.062  ops/s
ImperativeVsStream.testStreamAllMatch                 2  thrpt   14  14924513.929 ±  5446744.593  ops/s
ImperativeVsStream.testStreamAllMatch                 8  thrpt   14  12325486.891 ±  1365682.871  ops/s
ImperativeVsStream.testStreamAnyMatch                 2  thrpt   14  15729363.399 ±  2295020.470  ops/s
ImperativeVsStream.testStreamAnyMatch                 8  thrpt   14  13696297.091 ±   829121.255  ops/s
ImperativeVsStream.testStreamNoneMatch                2  thrpt   14  18991796.562 ±   147748.129  ops/s
ImperativeVsStream.testStreamNoneMatch                8  thrpt   14  15131005.381 ±   389830.419  ops/s
With this simple example, break or return don’t matter. At some point, adding additional methods might start getting in the way of inlining (because of stacks getting too deep), but not creating additional methods might be getting in the way of inlining as well (because of method bodies getting too large). I don’t want to bet on either approach here at this level, nor is jOOQ tuned that much. Like most similar libraries, the traversal of the jOOQ expression tree generates stack that are too deep to completely inline anyway. But the very obvious loser here is the Stream approach, which is roughly 6.5x slower in this benchmark than the imperative approaches. This isn’t surprising. The stream pipeline has to be set up every single time to represent something as trivial as the above imperative loop. I’ve already blogged about this in the past, where I compared replacing simple for loops by Stream.forEach()

Meh, does it matter?

In your business logic? Probably not. Your business logic is I/O bound, mostly because of the database. Wasting a few CPU cycles on a client side loop is not the main issue. Even if it is, the waste probably happens because your loop shouldn’t even be at the client side in the first place, but moved into the database as well. I’m currently touring conferences with a call about that topic:
In your infrastructure logic? Maybe! If you’re writing a library, or if you’re using a library like jOOQ, then yes. Chances are that a lot of your logic is CPU bound. You should occasionally profile your application and spot such bottlenecks, both in your code and in third party libraries. E.g. in most of jOOQ’s internals, using a stream pipeline might be a very bad choice, because ultimately, jOOQ is something that might be invoked from within your loops, thus adding significant overhead to your application, if your queries are not heavy (e.g. again when run against an H2 in-memory database). So, given that you’re clearly “micro-losing” on the performance side by using the Stream API, you may need to evaluate the readability tradeoff more carefully. When business logic is complex, readability is very important compared to micro optimisations. With infrastructure logic, it is much less likely so, in my opinion. And I’m not alone:
Note: there’s that other cargo cult of premature optimisation going around. Yes, you shouldn’t worry about these details too early in your application implementation. But you should still know when to worry about them, and be aware of the tradeoffs. And while you’re still debating what name to give to that extracted method, I’ve written 5 new labeled if statements! ;-)

How to Patch Your IDE to Fix an Urgent Bug

Clock’s ticking. JDK 11 will remove a bunch of deprecated modules through JEP 320, which includes the Java EE modules, which again includes JAXB, a dependency of many libraries, including jOOQ. Thus far, few people have upgraded to Java 9 or 10, as these aren’t LTS releases. Unlike in the old days, however, people will be forced much earlier to upgrade to Java 11, because Java 8 (the free version) will reach end of life soon after Java 11 is released:
End of Public Updates for Oracle JDK 8 As outlined in the Oracle JDK Support Roadmap below, Oracle will not post further updates of Java SE 8 to its public download sites for commercial use after January 2019
So, we library developers must act and finally modularise our libraries. Which is, quite frankly, a pain. Not because of the module system itself, which works surprisingly well. But because of the toolchain, which is far from being production ready. This mostly includes: It’s still almost not possible to maintain a modularised project in an IDE (I’ve tried Eclipse and IntelliJ, not Netbeans so far) as there are still tons of bugs. Some of which are showstoppers, halting compilation in the IDE (despite compilation working in Maven). For example: But rather than just complaining, let’s complain and fix it

Let’s fix our own IDE by patching it

Disclaimer: The following procedure assumes that you have the right to modify your IDE’s source and binaries. To my understanding, this is the case with the EPL licensed Eclipse. It may not be the case for other IDEs.
Disclaimer2: Note, as reddit user fubarbazqux so eloquently put it, there are cleaner ways to apply patches (and contribute them) to the Eclipse community, if you have more time. This article just displays a very easy way to do things without spending too much time to figure out how the Eclipse development processes work, internally. It shows a QUICK FIX recipe
The first bug was already discovered and fixed for Eclipse 4.8, but its RC4 version seems to have tons of other problems, so let’s not upgrade to that yet. Instead, let’s apply the fix that can be seen here to our own distribution: https://github.com/eclipse/eclipse.jdt.core/commit/e60c4f1f36f7efd5fbc1bbc661872b78c6939230#diff-e517e5944661053f0fcff49d9432b74e It’s just a single line: How do we do this? First off, go to the Eclipse Packages Download page: http://www.eclipse.org/downloads/eclipse-packages And download the “Eclipse IDE for Eclipse Committers” distribution: It will contain all the Eclipse source code, which we’ll need to compile the above class. In the new workspace, create a new empty plugin project: Specify the correct execution environment (in our case Java 10) and add all the Java Development Tools (JDT) dependencies: Or just add all the available dependencies, it doesn’t really matter. You can now open the type that you want to edit: Now, simply copy the source code from the editor and paste it in a new class inside of your project, which you put in the same package as the original (split packages are still possible in this case, yay) Inside of your copy, apply the desired patch and build the project. Since you already included all the dependencies, it will be easy to compile your copy of the class, and you don’t have to build the entirety of Eclipse. Now, go to your Windows Explorer or Mac OS X Finder, or Linux shell or whatever and find the compiled class: This class can now be copied into the Eclipse plugin. How to find the appropriate Eclipse plugin? Just go to your plugin dependencies and check out the location of the class you’ve opened earlier: Open that plugin from your Eclipse distribution’s /plugins folder using 7zip or whatever zipping tool you prefer, and overwrite the original class file(s). You may need to close Eclipse first, before you can write to the plugin zip file. And it’s always a good idea to make backup copies of the original plugin(s). Be careful that if your class has any nested classes, you will need to copy them all, e.g.
MyClass.class
MyClass$1.class // Anonymous class
MyClass$Nested.class // Named, nested class
Restart Eclipse, and your bug should be fixed!

How to fix my own bugs?

You may not always be lucky to find a bug with an existing fix in the bug tracker as in the second case: https://bugs.eclipse.org/bugs/show_bug.cgi?id=535927 No problemo, we can hack our way around that as well. Launch your normal Eclipse instance (not the “Eclipse IDE for Eclipse Committers” one) with a debug agent running, by adding the following lines to your eclipse.ini file:
-Xdebug 
-Xnoagent 
-Djava.compile=NONE 
-Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5005
Launch Eclipse again, then connect to your Eclipse from your other “Eclipse IDE for Eclipse Committers” instance by connecting a debugger: And start setting breakpoints wherever you need, e.g. here, in my case:
java.lang.NullPointerException
	at org.eclipse.jdt.internal.compiler.problem.ProblemHandler.handle(ProblemHandler.java:145)
	at org.eclipse.jdt.internal.compiler.problem.ProblemHandler.handle(ProblemHandler.java:226)
	at org.eclipse.jdt.internal.compiler.problem.ProblemReporter.handle(ProblemReporter.java:2513)
	at org.eclipse.jdt.internal.compiler.problem.ProblemReporter.deprecatedType(ProblemReporter.java:1831)
	at org.eclipse.jdt.internal.compiler.problem.ProblemReporter.deprecatedType(ProblemReporter.java:1808)
	at org.eclipse.jdt.internal.compiler.lookup.CompilationUnitScope.checkAndRecordImportBinding(CompilationUnitScope.java:960)
	at org.eclipse.jdt.internal.compiler.lookup.CompilationUnitScope.faultInImports(CompilationUnitScope.java:471)
	at org.eclipse.jdt.internal.compiler.lookup.CompilationUnitScope.faultInTypes(CompilationUnitScope.java:501)
	at org.eclipse.jdt.internal.compiler.Compiler.process(Compiler.java:878)
	at org.eclipse.jdt.internal.compiler.ProcessTaskManager.run(ProcessTaskManager.java:141)
	at java.lang.Thread.run(Unknown Source)
And start analysing the problem like your own bugs. The nice thing is, you don’t have to fix the problem, just find it, and possibly comment out some lines of code if you think they’re not really needed. In my case, luckily, the regression was introduced by a new method that is applied to JDK 9+ projects only:

String deprecatedSinceValue(Supplier<AnnotationBinding[]> annotations) {
    // ...
}

The method will check for the new @Deprecated(since="9") attribute on the @Deprecated annotation. Not an essential feature, so let’s just turn it off by adding this line to the source file:

String deprecatedSinceValue(Supplier<AnnotationBinding[]> annotations) {
    if (true) return;
    // ...
}

This will effectively prevent the faulty logic from ever running. Not a fix, but a workaround. For more details about this specific issue, see the report. Of course, never forget to actually report the issue to Eclipse (or whatever your IDE is), so it can be fixed thoroughly for everyone else as well Compile. Patch. Restart. Done!

Conclusion

Java is a cool platform. It has always been a very dynamic language at runtime, where compiled class files can be replaced by new versions at any moment, and re-loaded by the class loaders. This makes patching code by other vendors very easy, just:
  • Create a project containing the vendors’ code (or if you don’t have the code, the binaries)
  • Apply a fix / workaround to the Java class that is faulty (or if you don’t have the code, decompile the binaries if you are allowed to)
  • Compile your own version
  • Replace the version of the class file from the vendor by yours
  • Restart
This works with all software, including IDEs. In the case of jOOQ, all our customers have the right to modification, and they get the sources as well. We know how useful it is to be able to patch someone else’s code. This article shows it. Now, I can continue modularising jOOQ, and as a side product, improve the tool chain for everybody else as well. Again, this article displayed a QUICK FIX approach (some call it “hack”). There are more thorough ways to apply patches / fixes, and contribute them back to the vendor. Another, very interesting option would be to instrument your runtime and apply the fix only to byte code:
And:
Fixing Bugs in Running Java Code with Dynamic Attach

A note on IntelliJ and NetBeans

Again, I haven’t tried NetBeans yet (although I’ve heard its Java 9 support has been working very well for quite a while). While IntelliJ’s Jigsaw support seems more advanced than Eclipse’s (still with a few flaws as well), it currently has a couple of performance issues when compiling projects like jOOQ or jOOλ. In a future blog post, I will show how to “fix” those by using a profiler, like:
  • Java Mission Control (can be used as a profiler, too)
  • YourKit
  • JProfiler
Profilers can be used to very easily track down the main source of a performance problem. I’ve reported a ton to Eclipse already. For instance, this one: https://bugs.eclipse.org/bugs/show_bug.cgi?id=474686 Where a lot of time is being spent in the processing of Task Tags, like:
  • TODO
  • FIXME
  • XXX
The great thing about profiling this is:
  • You can report a precise bug to the vendor
  • You can find the flawed feature and turn it off as a workaround. Turning off the above task tag feature was a no-brainer. I’m not even using the feature.
So, stay tuned for another blog post, soon.

Truth First, or Why You Should Mostly Implement Database First Designs

In this much overdue article, I will explain why I think that in almost all cases, you should implement a “database first” design in your application’s data models, rather than a “Java first” design (or whatever your client language is), the latter approach leading to a long road of pain and suffering, once your project grows. This article is inspired by a recent Stack Overflow question. Interesting reddit discussions on /r/java and /r/programming.

Code generation

To my surprise, a small group of first time jOOQ users seem to be appalled by the fact that jOOQ heavily relies on source code generation. No one keeps you from using jOOQ the way you want and you don’t have to use code generation, but the default way to use jOOQ according to the manual is to start with a (legacy) database schema, reverse engineer that using jOOQ’s code generator to get a bunch of classes representing your tables, and then to write type safe queries against those tables:

for (Record2<String, String> record : DSL.using(configuration)
//   ^^^^^^^^^^^^^^^^^^^^^^^ Type information derived from the 
//   generated code referenced from the below SELECT clause

       .select(ACTOR.FIRST_NAME, ACTOR.LAST_NAME)
//           vvvvv ^^^^^^^^^^^^  ^^^^^^^^^^^^^^^ Generated names
       .from(ACTOR)
       .orderBy(1, 2)) {
    // ...
}

The code is generated either manually outside of the build, or automatically with every build. For instance, such a re-generation could follow immediately after a Flyway database migration, which can also be run either manually or automatically. Source code generation There are different philosophies, advantages, and disadvantages regarding these manual/automatic approaches, which I don’t want to discuss in this article. But essentially, the point of generated code is that it provides a Java representation of something that we take for granted (a “truth”) either within or outside of our system. In a way, compilers do the same thing when they generate byte code, machine code, or some other type of source code from the original sources – we get a representation of our “truth” in a different language, for whatever reason. There are many such code generators out there. For instance, XJC can generate Java code from XSD or WSDL files. The principle is always the same:
  • There is some truth (internal or external), like a specification, data model, etc.
  • We need a local representation of that truth in our programming language
And it almost always makes sense to generate that latter, to avoid redundancy. Type providers and annotation processing Noteworthy: Another, more modern approach to jOOQ’s particular code generation use-case would be Type Providers, as implemented by F#, in case of which the code is generated by the compiler while compiling. It never really exists in its source form. A similar (but less sophisticated) tool in Java are annotation processors, e.g. Lombok. In a way, this does the same thing except:
  • You don’t see the generated code (perhaps that’s less appalling to some?)
  • You must ensure the types can be provided, i.e. the “truth” must always be available. Easy in the case of Lombok, which annotates the “truth”. A bit more difficult with database models, which rely on an always available live connection.

What’s the problem with code generation?

Apart from the tricky question whether to trigger code generation manually or automatically, some people seem to think that code must not be generated at all. The reason I hear the most is the idea that it is difficult to set up in a build pipeline. And yes, that is true. There is extra infrastructure overhead. Especially if you’re new to a certain product (like jOOQ, or JAXB, or Hibernate, etc.), setting up an environment takes time you would rather spend learning the API itself and getting value out of it. If the overhead of learning how the code generator works is too high, then indeed, the API failed to make the code generator easy to use (and later on, to customise). That should be a high priority for any such API. But that’s the only argument against code generation. Other than that, it makes absolutely no sense at all to hand-write the local representation of the internal or external truth. Many people argue that they don’t have time for that stuff. They need to ship their MVPs. They can finalise their build pipelines later. I say:
“Too Busy To Improve” Licensed CC by Alan O’Rourke / Audience Stack. Original Image

“But Hibernate / JPA makes coding Java first easy”

Yes that’s true. And it’s both a bliss and a curse for Hibernate and its users. In Hibernate, you can just write a couple of entities, such as:

@Entity
class Book {
  @Id
  int id;
  String title;
}

And you’re almost set. Let Hibernate generate the boring “details” of how to define this entity in your SQL dialect’s DDL:

CREATE TABLE book (
  id INTEGER PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
  title VARCHAR(50),

  CONSTRAINT pk_book PRIMARY KEY (id)
);

CREATE INDEX i_book_title ON book (title);

… and start running the application. That’s really cool to get started quickly and to try out things. But, huh, wait. I cheated.
  • Will Hibernate really apply that named primary key definition?
  • Will it create the index on TITLE, which I know we’ll need?
  • Will it add an identity specification?
Probably not. While you’re developing your greenfield project, it is convenient to always throw away your entire database and re-generate it from scratch, once you’ve added the additional annotations. So, the Book entity would eventually look like this:

@Entity
@Table(name = "book", indexes = {
  @Index(name = "i_book_title", columnList = "title")
})
class Book {
  @Id
  @GeneratedValue(strategy = IDENTITY)
  int id;
  String title;
}

Cool. Re-generate. Again, this makes it really easy to get started.

But you’ll pay the price later on

At some point, you go to production. And that’s when this model no longer works. Because
Once you go live, you can no longer throw away your database, as your database has become legacy.
From now on, you have to write DDL migration scripts, e.g. using Flyway. And then, what happens to your entities? You can either adapt them manually (so you double the work), or have Hibernate re-generate them for you (how big are your chances of the generation matching your expectations?) You can only lose. Because once you go to production, you need hotfixes. And those have to go live fast. And since you didn’t prepare for pipelining your migrations to production smoothly, you’ll patch things wildly. And then you run out of time to do it right™. And you’ll blame Hibernate, because it’s always someone else’s fault… Instead, you could have done things entirely differently from the beginning. Like using those round wheels.

Go Database First

The real “truth” of your database schema, and the “sovereignty” over it, resides with your database. The database is the only place where the schema is defined, and all clients have a copy of the database schema, not vice versa. The data is in your database, not in your client, so it makes perfect sense to enforce the schema and its integrity in the database, right where the data is. This is old wisdom, nothing new. Primary and unique keys are good. Foreign keys are good. Check constraints are good. Assertions (when they’re finally implemented) are good. And that’s not where it ends. For instance, if you’re using Oracle, you may want to specify:
  • In what tablespace your table resides
  • What PCTFREE value it has
  • What the cache size of your sequence (behind the identity) is
Maybe, all of this doesn’t matter in small systems, but you don’t have to go “big data” before you can profit from vendor-specific storage optimisations as the above. None of the ORMs I’ve ever seen (including jOOQ) will allow you to use the full set of DDL options that you may want to use on your database. ORMs offer some tools to help you write DDL. But ultimately, a well-designed schema is hand written in DDL. All generated DDL is only an approximation of that.

What about the client model?

As mentioned before, you will need a copy of your database schema in your client, a client representation. Needless to say that this client representation needs to be in-sync with the real model. How to best do that? By using a code generator. All databases expose their meta information through SQL. Here’s how to get all tables from your database in various SQL dialects:

-- H2, HSQLDB, MySQL, PostgreSQL, SQL Server
SELECT table_schema, table_name
FROM information_schema.tables

-- DB2
SELECT tabschema, tabname
FROM syscat.tables

-- Oracle
SELECT owner, table_name
FROM all_tables

-- SQLite
SELECT name
FROM sqlite_master

-- Teradata
SELECT databasename, tablename
FROM dbc.tables

These queries (or similar ones, e.g. depending on whether views, materialised views, table valued functions should also be considered) are also run by JDBC’s DatabaseMetaData.getTables() call, or by the jOOQ-meta module. From the result of such queries, it’s relatively easy to generate any client representation of your database model, regardless what your client technology is.
  • If you’re using JDBC or Spring, you can create a bunch of String constants
  • If you’re using JPA, you can generate the entities themselves
  • If you’re using jOOQ, you can generate the jOOQ meta model
Depending on the amount of features your client API offers (e.g. jOOQ or JPA), the generated meta model can be really rich and complete. Consider, for instance, jOOQ 3.11’s implicit join feature, which relies on generated meta information about the foreign key relationships between your tables. Now, any database increment will automatically lead to updated client code. For instance, imagine:

ALTER TABLE book RENAME COLUMN title TO book_title;

Would you really want to do this work twice? No way. Just commit the DDL, run it through your build pipeline, and have an updated entity:

@Entity
@Table(name = "book", indexes = {

  // Would you have thought of this?
  @Index(name = "i_book_title", columnList = "book_title")
})
class Book {
  @Id
  @GeneratedValue(strategy = IDENTITY)
  int id;

  @Column("book_title")
  String bookTitle;
}

Or an updated jOOQ class. Plus: Your client code might no longer compile, which can be a good thing! Most DDL changes are also semantic changes, not just syntactic ones. So, it’s great to be able to see in compiled client source code, what code is (or may be) affected by your database increment.

A single truth

Regardless what technology you’re using, there’s always one model that contains the single truth for a subsystem – or at least, we should aim for this goal and avoid the enterprisey mess where “truth” is everywhere and nowhere. It just makes everything much simpler. If you exchange XML files with some other system, you’re going to use XSD. Like jOOQ’s INFORMATION_SCHEMA meta model in XML form: https://www.jooq.org/xsd/jooq-meta-3.10.0.xsd
  • XSD is well understood
  • XSD specifies XML content very well, and allows for validation in all client languages
  • XSD can be versioned easily, and evolved backwards compatibly
  • XSD can be translated to Java code using XJC
The last bullet is important. When communicating with an external system through XML messages, we want to be sure our messages are valid. That’s really really easy to do with JAXB, XJC, and XSD. It would be outright nuts to think that a Java-first approach where we design our messages as Java objects could somehow be reasonably mapped to XML for someone else to consume. That generated XML would be of very poor quality, undocumented, and hard to evolve. If there’s an SLA on such an interface, we’d be screwed. Frankly, that’s what happens to JSON APIs all the time, but that’s another story, another rant… Databases: Same thing When you’re using databases, it’s the same thing. The database owns its data and it should be the master of the schema. All modifications to the schema should be implemented using DDL directly, to update the single truth. Once that truth is updated, all clients need to update their copies of the model as well. Some clients may be written in Java, using either (or both of) jOOQ and Hibernate, or JDBC. Other clients may be written in Perl (good luck to them). Even other clients may be written in C#. It doesn’t matter. The main model is in the database. ORM-generated models are of poor quality, not well documented, and hard to evolve. So, don’t do it. And, don’t do it from the very beginning. Instead, go database first. Build a deployment pipeline that can be automated. Include code generators to copy your database model back into the clients. And stop worrying about code generation. It’s a good thing. You’ll be productive. All it takes is a bit of initial effort to set it up, and you’ll get years of improved productivity for the rest of your project. Thank me later.

Clarification

Just to be sure: This article in no way asserts that your database model should be imposed on your entire system (e.g. your domain, your business logic, etc. etc.). The claim I made here is that client code interacting with the database should act upon the database model, and not have its own first class model of the database instead. This logic typically resides in the data access layer of your client. In 2-tier architectures, which still have their place sometimes, that may be the only model of your system. In most systems, however, I consider the data access layer a “subsystem” that encapsulates the database model. So, there.

Exceptions

There are always exceptions, and I promised that the database first and code generation approach may not always be the right choice. These exceptions are (probably not exhaustive):
  • When the schema is unknown and must be discovered. E.g. you’re a tool vendor helping users navigate any schema. Duh… No code generation. But still database first.
  • When the schema needs to be generated on the fly for some task. This sounds a lot like a more or less sophisticated version of the entity attribute value pattern, i.e. you don’t really have a well-defined schema. In that case, it’s often not even sure if an RDBMS will be the right choice.
The nature of exceptions is that they’re exceptional. In the majority of RDBMS usage, the schema is known in advance, placed inside the RDBMS as the single source of “truth”, and clients will have derived copies from it – ideally generated using a code generator.