Writing Custom Aggregate Functions in SQL Just Like a Java 8 Stream Collector

All SQL databases support the standard aggregate functions COUNT(), SUM(), AVG(), MIN(), MAX().

Some databases support other aggregate functions, like:

  • EVERY()
  • VAR_POP()
  • VAR_SAMP()

But what if you want to roll your own?

Java 8 Stream Collector

When using Java 8 streams, we can easily roll our own aggregate function (i.e. a Collector). Let’s assume we want to find the second highest value in a stream. The highest value can be obtained like this:

    Stream.of(1, 2, 3, 4)
) ;



Now, what about the second highest value? We can write the following collector:

    Stream.of(1, 6, 2, 3, 4, 4, 5).parallel()
              () -> new int[] { 
              (a, i) -> {
                  if (a[0] < i) {
                      a[1] = a[0];
                      a[0] = i;
                  else if (a[1] < i)
                      a[1] = i;
              (a1, a2) -> {
                  if (a2[0] > a1[0]) {
                      a1[1] = a1[0];
                      a1[0] = a2[0];

                      if (a2[1] > a1[1])
                          a1[1] = a2[1];
                  else if (a2[0] > a1[1])
                      a1[1] = a2[0];

                  return a1;
              a -> a[1]
) ;

It doesn’t do anything fancy. It has these 4 functions:

  • Supplier<int[]>: A supplier that provides an intermediary int[] of length 2, initialised with Integer.MIN_VALUE, each. This array will remember the MAX() value in the stream at position 0 and the SECOND_MAX() value in the stream at position 1
  • BiConsumer<int[], Integer>: A accumulator that accumulates new values from the stream into our intermediary data structure.
  • BinaryOperator<int[]>: A combiner that combines two intermediary data structures. This is used for parallel streams only.
  • Function<int[], Integer>: The finisher function that extracts the SECOND_MAX() function from the second position in our intermediary array.

The output is now:


How to do the same thing with SQL?

Many SQL databases offer a very similar way of calculating custom aggregate functions. Here’s how to do the exact same thing with…


With the usual syntactic ceremony…

CREATE TYPE u_second_max AS OBJECT (

  -- Intermediary data structure

  -- Corresponds to the Collector.supplier() function
  STATIC FUNCTION ODCIAggregateInitialize(sctx IN OUT u_second_max) RETURN NUMBER,

  -- Corresponds to the Collector.accumulate() function
  MEMBER FUNCTION ODCIAggregateIterate(self IN OUT u_second_max, value IN NUMBER) RETURN NUMBER,

  -- Corresponds to the Collector.combineer() function
  MEMBER FUNCTION ODCIAggregateMerge(self IN OUT u_second_max, ctx2 IN u_second_max) RETURN NUMBER,

  -- Correspodns to the Collector.finisher() function
  MEMBER FUNCTION ODCIAggregateTerminate(self IN u_second_max, returnValue OUT NUMBER, flags IN NUMBER) RETURN NUMBER

-- This is our "colletor" implementation
  STATIC FUNCTION ODCIAggregateInitialize(sctx IN OUT u_second_max)
    SCTX := U_SECOND_MAX(0, 0);
    RETURN ODCIConst.Success;

  MEMBER FUNCTION ODCIAggregateIterate(self IN OUT u_second_max, value IN NUMBER) RETURN NUMBER IS
      SELF.MAX := VALUE;
    END IF;
    RETURN ODCIConst.Success;

  MEMBER FUNCTION ODCIAggregateTerminate(self IN u_second_max, returnValue OUT NUMBER, flags IN NUMBER) RETURN NUMBER IS
    RETURN ODCIConst.Success;

  MEMBER FUNCTION ODCIAggregateMerge(self IN OUT u_second_max, ctx2 IN u_second_max) RETURN NUMBER IS
      SELF.MAX := CTX2.MAX;
      END IF;
    END IF;
    RETURN ODCIConst.Success;

-- Finally, we have to give this aggregate function a name

We can now run the above on the Sakila database:

FROM film;

To get:

1000    999

And what’s even better, we can use the aggregate function as a window function for free!

  max(film_id) OVER (PARTITION BY length), 
  second_max(film_id) OVER (PARTITION BY length)
FROM film
ORDER BY length, film_id;

The above yields:

15       46      730   505
469      46      730   505
504      46      730   505
505      46      730   505
730      46      730   505
237      47      869   784
247      47      869   784
393      47      869   784
398      47      869   784
407      47      869   784
784      47      869   784
869      47      869   784
2        48      931   866
410      48      931   866
575      48      931   866
630      48      931   866
634      48      931   866
657      48      931   866
670      48      931   866
753      48      931   866
845      48      931   866
866      48      931   866
931      48      931   866

Beautiful, right?


PostgreSQL supports a slightly more concise syntax in the CREATE AGGREGATE statement. If we don’t allow for parallelism, we can write this minimal implementation:

CREATE FUNCTION second_max_sfunc (
  state INTEGER[], data INTEGER
      WHEN state[1] > data
      THEN CASE 
        WHEN state[2] > data
        THEN state
        ELSE ARRAY[state[1], data]
      ELSE ARRAY[data, state[1]]
$$ LANGUAGE plpgsql;

CREATE FUNCTION second_max_ffunc (
  state INTEGER[]
  RETURN state[2];
$$ LANGUAGE plpgsql;

  SFUNC     = second_max_sfunc,
  STYPE     = INTEGER[],
  FINALFUNC = second_max_ffunc

Here, we use the STYPE (Collector.supplier()), the SFUNC (Collector.accumulator()), and the FINALFUNC (Collector.finisher()) specifications.

Other databases

Many other databases allow for specifying user defined aggregate functions. Look up your database manual’s details to learn more. They always work in the same way as a Java 8 Collector.

How to Compile a Class at Runtime with Java 8 and 9

In some cases, it’s really useful to be able to compile a class at runtime using the java.compiler module. You can e.g. load a Java source file from the database, compile it on the fly, and execute its code as if it were part of your application.

In the upcoming jOOR 0.9.8, this will be made possible through https://github.com/jOOQ/jOOR/issues/51. As always with jOOR (and our other projects), we’re wrapping existing JDK API, simplifying the little details that you often don’t want to worry about. Using jOOR API, you can now write:

// Run this code from within the com.example package

Supplier<String> supplier = Reflect.compile(
    "package com.example;\n" +
    "class CompileTest\n" +
    "implements java.util.function.Supplier<String> {\n" +
    "  public String get() {\n" +
    "    return \"Hello World!\";\n" +
    "  }\n" +


And the result is, of course:

Hello World!

If we already had JEP-326, this would be even cooler!

Supplier<String> supplier = Reflect.compile(
    `package org.joor.test;
     class CompileTest
     implements java.util.function.Supplier<String> {
       public String get() {
         return "Hello World!"


What happens behind the scenes?

Again, as in our previous blog post, we need to ship two different versions of our code. One that works in Java 8 (where reflecting and accessing JDK internal API was possible), and one that works in Java 9+ (where this is forbidden). The full annotated API is here:

package org.joor;

import java.io.ByteArrayOutputStream;
import java.io.OutputStream;
import java.lang.invoke.MethodHandles;
import java.lang.invoke.MethodHandles.Lookup;
import java.net.URI;
import java.util.ArrayList;
import java.util.List;

import javax.tools.*;

import static java.lang.StackWalker.Option.RETAIN_CLASS_REFERENCE;

class Compile {

    static Class<?> compile(String className, String content) 
    throws Exception {
        Lookup lookup = MethodHandles.lookup();

        // If we have already compiled our class, simply load it
        try {
            return lookup.lookupClass()

        // Otherwise, let's try to compile it
        catch (ClassNotFoundException ignore) {
            return compile0(className, content, lookup);

    static Class<?> compile0(
        String className, String content, Lookup lookup)
    throws Exception {
        JavaCompiler compiler = 

        ClassFileManager manager = new ClassFileManager(
            compiler.getStandardFileManager(null, null, null));

        List<CharSequenceJavaFileObject> files = new ArrayList<>();
        files.add(new CharSequenceJavaFileObject(
            className, content));

        compiler.getTask(null, manager, null, null, null, files)
        Class<?> result = null;

        // Implement a check whether we're on JDK 8. If so, use
        // protected ClassLoader API, reflectively
        if (onJava8()) {
            ClassLoader cl = lookup.lookupClass().getClassLoader();
            byte[] b = manager.o.getBytes();
            result = Reflect.on(cl).call("defineClass", 
                className, b, 0, b.length).get();

        // Lookup.defineClass() has only been introduced in Java 9.
        // It is required to get private-access to interfaces in
        // the class hierarchy
        else {

            // This method is called by client code from two levels
            // up the current stack frame. We need a private-access
            // lookup from the class in that stack frame in order
            // to get private-access to any local interfaces at
            // that location.
            Class<?> caller = StackWalker
                .walk(s -> s

            // If the compiled class is in the same package as the
            // caller class, then we can use the private-access 
            // Lookup of the caller class
            if (className.startsWith(caller.getPackageName() )) {
                result = MethodHandles
                    .privateLookupIn(caller, lookup)

            // Otherwise, use an arbitrary class loader. This
            // approach doesn't allow for loading private-access 
            // interfaces in the compiled class's type hierarchy
            else {
                result = new ClassLoader() {
                    protected Class<?> findClass(String name) 
                    throws ClassNotFoundException {
                        byte[] b = fileManager.o.getBytes();
                        int len = b.length;
                        return defineClass(className, b, 0, len);

        return result;

    // These are some utility classes needed for the JavaCompiler
    // ----------------------------------------------------------

    static final class JavaFileObject 
    extends SimpleJavaFileObject {
        final ByteArrayOutputStream os = 
            new ByteArrayOutputStream();

        JavaFileObject(String name, JavaFileObject.Kind kind) {
              + name.replace('.', '/') 
              + kind.extension), 

        byte[] getBytes() {
            return os.toByteArray();

        public OutputStream openOutputStream() {
            return os;

    static final class ClassFileManager 
    extends ForwardingJavaFileManager<StandardJavaFileManager> {
        JavaFileObject o;

        ClassFileManager(StandardJavaFileManager m) {

        public JavaFileObject getJavaFileForOutput(
            JavaFileManager.Location location,
            String className,
            JavaFileObject.Kind kind,
            FileObject sibling
        ) {
            return o = new JavaFileObject(className, kind);

    static final class CharSequenceJavaFileObject 
    extends SimpleJavaFileObject {
        final CharSequence content;

        public CharSequenceJavaFileObject(
            String className, 
            CharSequence content
        ) {
              + className.replace('.', '/') 
              + JavaFileObject.Kind.SOURCE.extension), 
            this.content = content;

        public CharSequence getCharContent(
            boolean ignoreEncodingErrors
        ) {
            return content;

Notice how the JDK 9 version is a bit more complicated, as we have to:

  • Find the caller class of our method
  • Get a private method handle lookup for that class if the class being compiled is in the same package as the class calling the compilation
  • Otherwise, use an arbitrary class loader to define the class

Reflection definitely hasn’t become simpler with Java 9!

How to Ensure Your Code Works With Older JDKs

jOOQ is a very backwards compatible product. This doesn’t only mean that we keep our own API backwards compatible as well as possible, but we also still support Java 6 in our commercial distributions.

In a previous blog post, I’ve shown how we manage to support Java 6 while at the same time not missing out on cool Java 8 language and API features, such as Stream and Optional support. For instance, you can do this with jOOQ’s ordinary distribution:

// Fetching 0 or 1 actors
Optional<Record2<String, String>> actor =

// Fetching a stream of actors
try (Stream<Record2<String, String>> actor = ctx
       .fetchStream()) {

This API is present in jOOQ’s ordinary distribution and it is stripped from that distribution prior to building the Java 6 distribution.

But what about the JDK’s more subtle APIs?

It is relatively easy to remember not to use Streams, Optionals, lambdas, method references, default methods lightheartedly in your library’s code. After all, those were all major changes to Java 8 and we can easily add our API removal markers around those parts. And even if we forgot, building the Java 6 distribution would quite probably fail, because Streams are very often used with lambdas, in case of which a compiler that is configured for Java version 1.6 will not compile the code.

But recently, we’ve had a more subtle bug, #6860. jOOQ API was calling java.lang.reflect.Method.getParameterCount(). Since we compile jOOQ’s Java 6 distribution with Java 8, this didn’t fail. The sources were kept Java 6 language compatible, but not JDK 6 API compatible, and unfortunately, there’s no option in javac, nor in the Maven compiler plugin to do such a check.

Why not use Java 6 to compile the Java 6 distribution?

The reason why we’re using Java 8 to build jOOQ’s Java 6 distribution is the fact that Java 8 “fixed” a lot (and I mean a lot) of very old and weird edge cases related to generics,
overloading, varargs, and all that stuff. While this might be irrelevant for ordinary APIs, for jOOQ it is not. We really push the limits of what’s possible with the Java language.

So, we’re paying a price for building jOOQ’s Java 6 distribution with Java 8. We’re flying in “stealth mode”, not 100% sure whether our JDK API usage is compliant.

Luckily, the JDK doesn’t change much between releases, so a lot of stuff from JDK 8 was already there in JDK 6. Also, our integration tests would fail, if we did accidentally use a method like the above. Unfortunately, that particular method call simply slipped by the integration tests (there will never be enough tests for every scenario).

The solution

Apart from fixing the trivial bug and avoiding that particular method, we’ve now added the cool “animal sniffer” Maven plugin to our Java 6 build, whose usage you can see here:

All we needed to add to our Java 6 distribution profile was this little snippet:


This will then produce a validation error like the following:

[INFO] --- animal-sniffer-maven-plugin:1.16:check (default) @ jooq-codegen ---
[INFO] Checking unresolved references to org.codehaus.mojo.signature:java16:1.0
[ERROR] C:\..\JavaGenerator.java:232: Undefined reference: int java.lang.reflect.Method.getParameterCount()
[ERROR] C:\..\JavaGenerator.java:239: Undefined reference: int java.lang.reflect.Method.getParameterCount()


Squeezing Another 10% Speed Increase out of jOOQ using JMC and JMH

In this post, we’re going to discuss a couple of recent efforts to squeeze roughly 10% in terms of speed out of jOOQ by iterating on hotspots that were detected using JMC (Java Mission Control) and then validated using JMH (Java Microbenchmark Harness). This post shows how to apply micro optimisations to algorithms where the smallest improvement can have a significant effect.

While JMH is probably without competition, JMC could easily be replaced by JProfiler, YourKit, or even your own manual jstack sampling. I’ll just use JMC because it ships with the JDK and is free for use for development as of JDK 8 and 9 (if you’re unsure whether you’re “developing”, better ask Oracle). Rumours have it that JMC might be contributed to the OpenJDK in the near future.

Micro optimisations

Micro optimisations are a cool technique to squeeze a very small improvement out of a local algorithm (e.g. a loop) that has a significant effect on the entire application / library, because of the fact that the local algorithm is called many times. This is absolutely the case in jOOQ, which is essentially a library that always runs 4 nested loops:

  1. S: A “loop” over all possible SQL statements
  2. E: A “loop” over all executions of such a statement
  3. R: A loop over all rows in the result
  4. C: A loop over all columns in a row

Such four level nested loops result in what we could call a polynomial complexity of our algorithms, even if we cannot call the complexity O(N4) (as the 4 “N” are not all the same), it is certainly of O(S x E x R x C) (I’ll call this “S-E-R-C loops” further down). Even to the untrained eye, it becomes evident that anything that happens in the inner-most “C-loop” can have devastating effects. We better not be opening any files here, that could be opened outside of, e.g. the “S-loop”

In a previous blog post, we’ve discussed common techniques of optimising such situations. In this blog post, we’ll look into a couple of concrete examples.

How to discover flaws in these loops?

We’re looking for the problems that affect all users, the kind of problem that, once fixed, will improve jOOQ’s performance for everyone by e.g. 10%. This is similar to what the JIT does, by performing things like stack allocation, inlining, which don’t drastically improve things locally, but do so globally, and for everyone. Here’s an interesting guest post by Tagir Valeev on JIT optimisation, and how good it is.

Getting a large “S-loop”

The first option is to run profiling sessions on benchmarks. We could, for example, run the entire “S-E-R-C loops” in a JMC profiling session, where the “S-loop” is a loop over all our statements, or in other words, over all our integration tests. Unfortunately, with this approach, our “E-loop” (in the case of jOOQ’s integration tests) is a single execution per statement. We’d have to run the integration tests many, many times in order to get meaningful results.

Also, while the jOOQ integration tests run thousands of distinct queries, most queries are still rather simple, each one focusing on an individual SQL feature (e.g. lateral join). In a end user application, queries might use less specific features, but are much more complex, i.e. they have a lot of ordinary joins.

This technique is useful to find problems that appear in all queries, deep down inside of jOOQ – e.g. at the JDBC interface. But we cannot use this approach to test individual features.

Getting a large “E-loop”

Another option is to write a single test that runs a few statements (small “S-loop”) many times in an explicit loop (large “E-loop”). This has the advantage that a specific bottleneck can be found with a high confidence, but the drawback is: It’s specific. For instance, if we find a small bottleneck in the string concatenation function, well, that is certainly worth fixing, but doesn’t affect most users.

This approach is useful to test individual features. It can also be useful for finding issues that affect all queries, but with a lower confidence than the previous case, where the “S-loop” is maximised.

Getting large “R-loops” and “C-loops”

Creating large result sets is easy and should definitely be part of such benchmarks, because in the case of a large result set, any flaw will multiply drastically, so fixing these things is worthwhile. However, these problems only affect actual result sets, not the query building process or the execution process. Sure, most statements are probably queries, not insertions / updates, etc. But this needs to be kept in mind.

Optimising for problems in large “E-loops”

All of the above scenarios are different optimisation sessions and deserve their own blog posts. In this post, I’m describing what has been discovered and fixed when running a single query 3 million times on an H2 database. The H2 database is chosen here, because it can run in memory of the same process and thus has the least extra overhead compared to jOOQ – so jOOQ’s overhead contributions become significant in a profiling session / benchmark. In fact, it can be shown that in such a benchmark, jOOQ (or Hibernate, etc.) appears to perform quite poorly compared to a JDBC only solution, as many have done before.

This is an important moment to remind ourselves:

Benchmarks do not reflect real-world use cases! You will never run the exact same query 3 million times on a production system, and your production system doesn’t run on H2.

A benchmark profits from so much caching, buffering, you would never perform as fast as in a benchmark.

Always be careful not to draw any wrong conclusions from a benchmark!

This needs to be said, so take every benchmark you find on the web with a grain of salt. This includes our own!

The query being profiled is:


The trivial query returns a ridiculous 4 rows and 4 columns, so the “R-loop” and “C-loops” are negligible. This benchmark is really testing the overhead of jOOQ query execution in a case where the database does not contribute much to the execution time. Again, in a real world scenario, you will get much more overhead from your database.

In the following sections, I’ll show a few minor bottlenecks that could be found when drilling down into these such execution scenarios. As I’ve switched between JMC versions, the screenshots will not always be the same, I’m afraid.

1. Instance allocation of constant values

A very silly mistake was easily discovered right away:

The mistake didn’t contribute a whole lot of overhead, only 1.1% to the sampled time spent, but it made me curious. In version 3.10 of jOOQ, the SelectQueryImpl‘s Limit class, which encodes the jOOQ OFFSET / LIMIT behaviour kept allocating this DSL.val() thingy, which is a bind variable. Sure, limits do work with bind variables, but this happened when SelectQueryImpl was initialised, not when the LIMIT clause is added by the jOOQ API user.

As can be seen in the sources, the following logic was there:

private static final Field<Integer> ZERO              = zero();
private static final Field<Integer> ONE               = one();
private Field<Integer>              numberOfRowsOrMax = 

While the “special limits” ZERO and ONE were static members, the numberOfRowsOrMax value wasn’t. That’s the instantiation we were measuring in JMC. The member is not a constant, but the default value is. It is always initialised with Integer.MAX_VALUE wrapped in an DSL.inline() call. The solution is really simple:

private static final Param<Integer> MAX               = 
private Field<Integer>              numberOfRowsOrMax = MAX;

This is obviously better! Not only does it avoid the allocation of the bind variable, it also avoids the boxing of Integer.MAX_VALUE (which can also be seen in the sampling screenshot).

Note, a similar optimisation is available in the JDK’s ArrayList. When you look at the sources, you’ll see:

 * Shared empty array instance used for empty instances.
private static final Object[] EMPTY_ELEMENTDATA = {};

When you initialise an ArrayList without initial capacity, it will reference this shared instance, instead of creating a new, empty (or even non-empty) array. This delays the allocation of such an array until we actually add things to the ArrayList, just in case it stays empty.

jOOQ’s LIMIT is the same. Most queries might not have a LIMIT, so better not allocate that MAX_VALUE afresh!

This is done once per “E-loop” iteration

One issue down: https://github.com/jOOQ/jOOQ/issues/6635

2. Copying lists in internals

This is really a micro optimisation that you probably shouldn’t do in ordinary business logic. But it might be worthwhile in infrastructure logic, e.g. when you’re also in an “S-E-R-C loop”:

jOOQ (unfortunately) occasionally copies data around between arrays, e.g. wrapping Strings in jOOQ wrapper types, transforming numbers to strings, etc. These loops aren’t bad per se, but remember, we’re inside some level of the “S-E-R-C loop”, so these copying operations might be run hundreds of millions of times when we run a statement 3 million times.

The above loop didn’t contribute a lot of overhead, and possible the cloned object was stack allocated or the clone call eliminated by the JIT. But maybe it wasn’t. The QualifiedName class cloned its argument prior to returning it to make sure that no accidental modifications will have any side effect:

private static final String[] nonEmpty(String[] qualifiedName) {
    String[] result;
    if (nulls > 0) {
        result = new String[qualifiedName.length - nulls];
    else {
        result = qualifiedName.clone();
    return result;

So, the implementation of the method guaranteed a new array as a result.

After a bit of analysis, it could be seen that there is only a single consumer of this method, and it doesn’t leave that consumer. So, it’s safe to remove the clone call. Probably, the utility was refactored from a more general purpose method into this local usage.

This is done several times per “E-loop” iteration

One more issue down: https://github.com/jOOQ/jOOQ/issues/6640

3. Running checks in loops

This one is too silly to be true:

There’s a costly overhead in the CombinedCondition constructor (<init> method). Notice, how the samples drop from 0.47% to 0.32% between the constructor and the next method init(), that’s the time spent inside the constructor.

A tiny amount of time, but this time is spent every time someone combines two conditions / predicates with AND and OR. Every time. We can probably save this time. The problem is this:

CombinedCondition(Operator operator, Collection<? extends Condition> conditions) {
    for (Condition condition : conditions)
        if (condition == null)
            throw new IllegalArgumentException("The argument 'conditions' must not contain null");

    init(operator, conditions);

There’s a loop over the arguments to give some meaningful error messages. That’s a bit too defensive, I suspect. How about we simply live with the NPE when it arises, as this should be rather unexpected (for the context, jOOQ hardly ever checks on parameters like this, so this should also be removed for consistency reasons).

This is done several times per “E-loop” iteration

One more issue down: https://github.com/jOOQ/jOOQ/issues/6666 (nice number)

4. Lazy initialisation of lists

The nature of the JDBC API forces us to work with ThreadLocal variables, very unfortunately, as it is not possible to pass arguments from parent SQLData objects to children, especially when we combine nesting of Oracle TABLE/VARRAY and OBJECT types.

In this analysis, we’re combining the profiler’s CPU sampling with its memory sampling:

In the CPU sampling view above, we can see some overhead in the DefaultExecuteContext, which is instantiated once per “E-loop” iteration. Again, not a huge overhead, but let’s look at what this constructor does. It contributes to the overall allocations of ArrayList:

When we select the type in JMC, the other view will then display all the stack traces where ArrayList instances were allocated, among which, again, our dear DefaultExecuteContext constructor:

Where are those ArrayLists allocated? Right here:

BLOBS.set(new ArrayList<Blob>());
CLOBS.set(new ArrayList<Clob>());
SQLXMLS.set(new ArrayList<SQLXML>());
ARRAYS.set(new ArrayList<Array>());

Every time we start executing a query, we initialise a list for each ones of these types. All of our variable binding logic will then register any possibly allocated BLOB or CLOB, etc. such that we can clean these up at the end of the execution (a JDBC 4.0 feature that not everyone knows of!):

static final void register(Blob blob) {
static final void clean() {
    List<Blob> blobs = BLOBS.get();

    if (blobs != null) {
        for (Blob blob : blobs)


Don’t forget calling Blob.free() et al, if you’re working with JDBC directly!

But the truth is, in most cases, we don’t really need these things. We need them only in Oracle, and only if we’re using TABLE / VARRAY or OBJECT types, due to some JDBC restrictions. Why punish all the users of other databases with this overhead? Instead of a sophisticated refactoring, which risks introducing regressions (https://github.com/jOOQ/jOOQ/issues/4205), we can simply initialise these lists lazily. We leave the clean() method as it is, remove the initialisation in the constructor, and replace the register() logic by this:

static final void register(Blob blob) {
    List<Blob> list = BLOBS.get();

    if (list == null) {
        list = new ArrayList<Blob>();


That was easy. And significant. Check out the new allocation measurements:

Note that every allocation, apart from the overhead of allocating things, also incurs additional overhead when the object is garbage collected. That’s a bit trickier to measure and correlate. In general, less allocations is almost always a good thing, except if the allocation is super short lived, in case of which stack allocation can happen, or the logic can even be eliminated by the JIT.

This is done several times per “E-loop” iteration

One more issue down: https://github.com/jOOQ/jOOQ/issues/6669

6. Using String.replace()

This is mostly a problem in JDK 8 only, JDK 9 fixed string replacing by no longer relying on regular expressions internally. In JDK 8, however (and jOOQ still supports Java 6, so this is relevant), string replacement works through regular expressions as can be seen here:

The Pattern implementation allocates quite a few int[] instances, even if that’s probably not strictly needed for non-regex patterns as those of String.replace():

I’ve already analysed this in a previous blog post, which can be seen here:


This is done several times per “E-loop” iteration

One more issue down: https://github.com/jOOQ/jOOQ/issues/6672

7. Registering an SPI that is going to be inactive

This one was a bit more tricky to solve as it relies on a deeper analysis. Unfortunately, I have no profiling screenshots available anymore, but it is easy to explain with code. There’s an internal ExecuteListeners utility, which abstracts over the ExecuteListener SPIs. Users can register such a listener and listen to query rendering, variable binding, query execution, and other lifecycle events. By default, there is no such ExecuteListener by the users, but there’s always one internal ExecuteListener:

private static ExecuteListener[] listeners(ExecuteContext ctx) {
    List<ExecuteListener> result = new ArrayList<ExecuteListener>();

    for (ExecuteListenerProvider provider : ctx.configuration()
        if (provider != null)

    if (!FALSE.equals(ctx.settings().isExecuteLogging()))
        result.add(new LoggerListener());

    return result.toArray(EMPTY_EXECUTE_LISTENER);

The LoggerListener is added by default, unless users turn off that feature. Which means:

  • We’ll pretty much always get this ArrayList
  • We’ll pretty much always loop over this list
  • We’ll pretty much always clal this LoggerListener

But what does it do? It logs stuff on DEBUG and TRACE level. For instance:

public void executeEnd(ExecuteContext ctx) {
    if (ctx.rows() >= 0)
        if (log.isDebugEnabled())
            log.debug("Affected row(s)", ctx.rows());

That’s what it does by definition. It’s a debug logger. So, the improved logic for initialising this thing is the following:

private static final ExecuteListener[] listeners(ExecuteContext ctx) {
    List<ExecuteListener> result = null;

    for (ExecuteListenerProvider provider : ctx.configuration()
        if (provider != null)
            (result = init(result)).add(provider.provide());

    if (!FALSE.equals(ctx.settings().isExecuteLogging())) {
        if (LOGGER_LISTENER_LOGGER.isDebugEnabled())
            (result = init(result)).add(new LoggerListener());

    return result == null ? null : result.toArray(EMPTY_EXECUTE_LISTENER);

We’re no longer allocating the ArrayList (that might be premature, the JIT might have rewritten this allocation to not happen, but OK), and we’re only adding the LoggerListener if it DEBUG or TRACE logging is enabled for it, i.e. if it would do any work at all.

That’s just a couple of CPU cycles we can save on every execution. Again, I don’t have the profiling measurements anymore, but trust me. It helped.

This is done several times per “E-loop” iteration

One more issue down: https://github.com/jOOQ/jOOQ/issues/6747

8. Eager allocation where lazy allocation works

Sometimes, we need two different representations of the same information. The “raw” representation, and a more useful, pre-processed representation for some purposes. This was done, for instance, in QualifiedField:

private final Name          name;
private final Table<Record> table;

QualifiedField(Name name, DataType<T> type) {
    super(name, type);

    this.name = name;
    this.table = name.qualified()
        ? DSL.table(name.qualifier())
        : null;

public final void accept(Context<?> ctx) {

public final Table<Record> getTable() {
    return table;

As can be seen, the name is really the beef of this class. It’s a qualified name that generates itself on the SQL string. The Table representation is useful when navigating the meta model, but this is hardly ever done by jOOQ’s internals and/or user facing code.

However, this eager initialisation it is costly:

Quite a few UnqualifiedName[] arrays are allocated by the call to Name.qualifier(). We can easily make that table reference non-final and calculate it lazily:

private final Name              name;
private Table<Record>           table;

QualifiedField(Name name, DataType<T> type) {
    super(name, type);

    this.name = name;

public final Table<Record> getTable() {
    if (table == null)
        table = name.qualified() ? DSL.table(name.qualifier()) : null;

    return table;

Because name is final, we could call table “effectively final” (in a different meaning than the Java language’s) – we won’t have any thread safety issues because these particular types are immutable inside of jOOQ.

This is done several times per “E-loop” iteration

One more issue down: https://github.com/jOOQ/jOOQ/issues/6755


Now, thus far, we’ve “improved” many low hanging fruit based on a profiler session (that was run, akhem, from outside of Eclipse on a rather busy machine). This wasn’t very scientific. Just tracking down “bottlenecks” which triggered my interest by having high enough numbers to even notice. This is called “micro optimisation”, and it is only worth the trouble if you’re in a “S-E-R-C loop”, meaning that the code you’re optimising is executed many many times. For me, developing jOOQ, this is almost always the case, because jOOQ is a library used by a lot of people who all profit from these optimisations. In many other cases, this might be called “premature optimisation”

But once we’ve optimised, we shouldn’t stop. I’ve done a couple of individual JMH benchmarks for many of the above problems, to see if they were really an improvement. But sometimes, in a JMH benchmark, something that doesn’t look like an improvement will still be an improvement in the bigger picture. The JVM doesn’t inline all methods 100 levels deep. If your algorithm is complex, perhaps a micro optimisation will still have an effect that would not have any effect on a JMH benchmark.

Unfortunately this isn’t very exact science, but with enough intuition, you’ll find the right spots to optimise.

In my case, I verified progress over two patch releases: 3.10.0 -> 3.10.1 -> 3.10.2 (not yet released) by running a JMH benchmark over the entire query execution (including H2’s part). The results of applying roughly 15 of the above and similar optimisations (~2 days’ worth of effort) is:

JDK 9 (9+181)

jOOQ 3.10.0 Open Source Edition

Benchmark                          Mode   Cnt       Score      Error  Units
ExecutionBenchmark.testExecution   thrpt   21  101891.108 ± 7283.832  ops/s

jOOQ 3.10.2 Open Source Edition

Benchmark                          Mode   Cnt       Score      Error  Units
ExecutionBenchmark.testExecution   thrpt   21  110982.940 ± 2374.504  ops/s

JDK 8 (1.8.0_145)

jOOQ 3.10.0 Open Source Edition

Benchmark                          Mode   Cnt       Score      Error  Units
ExecutionBenchmark.testExecution   thrpt   21  110178.873 ± 2134.894  ops/s

jOOQ 3.10.2 Open Source Edition

Benchmark                          Mode   Cnt       Score      Error  Units
ExecutionBenchmark.testExecution   thrpt   21  118795.922 ± 2661.653  ops/s

As can be seen, in both JDK versions, we’ve gotten roughly a 10% speed increase. What’s interesting is also that JDK 8 seemed to have been also 10% faster than JDK 9 in this benchmark, although this can be due to a variety of things that I haven’t considered yet, and which are out of scope for this discussion.


This iterative approach to tackling performance is definitely worth it for library authors:

  • run a representative benchmark (repeat a task millions of times)
  • profile it
  • track down “bottlenecks”
  • if they’re easy to fix without regression risk, do it
  • repeat
  • after a while, verify with JMH

Individual improvements are quite hard to measure, or measure correctly. But when you do 10-15 of them, they start adding up and become significant. 10% can make a difference.

Looking forward to your comments, alternative techniques, alternative tools, etc.!

If you liked this article, you will also like Top 10 Easy Performance Optimisations in Java

jOOQ Tuesdays: Nicolai Parlog Talks About Java 9

Welcome to the jOOQ Tuesdays series. In this series, we’ll publish an article on the third Tuesday every other month where we interview someone we find exciting in our industry from a jOOQ perspective. This includes people who work with SQL, Java, Open Source, and a variety of other related topics.

I’m very excited to feature today Nicolai Parlog, author of The Java Module System

Nicolai, your blog is an “archeological” treasure trove for everyone who wants to learn about why Java expert group decisions were made. What made you dig out all these interesting discussions on the mailing lists?

Ha, thank you, didn’t know I was sitting on a treasure.

It all started with everyone’s favorite bikeshed: Optional. After using it for a few months, I was curious to learn more about the reason behind its introduction to Java and why it was designed the way it was, so I started digging and learned a few things:

  • Piperman, the JDK mailing list archive, is a horrible place to peruse and search.
  • Mailing list discussions are often lengthy, fragmented, and thus hard to revisit.
  • Brian Goetz was absolutely right: Everything related to Optional seems to take 300 messages.

Consequently, researching that post about Optional’s design took a week or so. But as you say, it’s interesting to peek behind the curtain and once a discussion is condensed to its most relevant positions and peppered with some context it really appeals to the wider Java community.

I actually think there’s a niche to be filled, here. Imagine there were a site that did regularly (at least once a week) what I did with a few selected topics: Follow the JDK mailing list, summarize ongoing discussions, and make them accessible to a wide audience. That would be a great service to the Java community as it would make it much easier to follow what is going on and to chime in with an informed opinion when you feel you have something to contribute. Now we just need to find someone with a lot of free time on their hands.

By the way, I think it’s awesome that the comparitively open development of the JDK makes that possible.

I had followed your blog after Java 8 came out, where you explained expert group decisions in retrospect. Now, you’re mostly covering what’s new in Java 9. What are your favourite “hidden” (i.e. non-Jigsaw) Java 9 features and why?

From the few language changes, it’s easy pickings: definitely private interface methods. I’ve been in the situation more than once that I wanted to share code between default methods but found no good place to put it without making it part of the public API. With private mehods in interfaces, that’s a thing of the past.

When it comes to API changes, the decision is much harder as there is more to choose from. People definitely like collection factory methods and I do, too, but I think I’ll go with the changes to Stream and Optional. I really enjoy using those Java 8 features and think it’s great that they’ve been improved in 9.

A JVM feature I really like are multi-release JARs. The ability to ship a JAR that uses the newest APIs, but degrades gracefully on older JVMs will come in very handy. Some projects, Spring for example, already do this, but without JVM support it’s not exactly pleasant.

Can I go on? Because there’s so much more! Just two: Unified logging makes it much easier to tease out JVM log messages without having to configure logging for different subsystems and compact strings and indified string concatenation make working with strings faster, reduce garbage and conserve heap space (on average, 10% to 15% less memory!). Ok, that were three, but there you go.

You’re writing a book on the Java 9 module system that can already be pre-ordered on Manning. What will readers get out of your book?

All they need to become module system experts. Of course it explains all the basics (delcaring, compiling, packaging, and running modular applications) and advanced features (services, implied readability, optional dependencies, etc), but it goes far beyond that. More than how to use a feature it also explains when and why to use it, which nuances to consider, and what are good defaults if you’re not sure which way to go.

It’s also full of practical advice. I migrated two large applications to Java 9 (compiling and running on the new release, not turning them into modules) and that experience as well as the many discussions on the mailing list informed a big chapter on migration. If readers are interested in a preview, I condensed it into a post on the most common Java 9 migration challenges. I also show how to debug modules and the module system with various tools (JDeps for example) and logging (that’s when I started using uniform logging), Last but not least, I plan to include a chapter that simply lists error messages and what to do about them.

In your opinion, what are the good parts and the bad parts about  Jigsaw? Do you think Jigsaw will be adopted quickly?

The good, the bad, and the ugly, eh? My favorite feature (of all of Java 9 actually) is strong encapsulation. The ability to have types that are public only within a module is incredibly valuable! This adds another option to the private-to-public-axis and once people internalize that feature we will wonder how we ever lived without it. Can you imagine giving up private? We will think the same about exported.

I hope the worst aspect of the module system will be the compatibility challenges. That’s a weird way to phrase it, but let me explain. These challenges definitely exist and they will require a non-neglectable investmement from the Java community as a whole to get everything working on Java 9, in the long run as modules. (As an aside: This is well invested time – much of it pays back technical debt.)

My hope is that no other aspect of the module system turns out to be worse. One thing I’m a little concerned about is the strictness of reliable configuration. I like the general principle and I’m definitely one for enforcing good practices, but just think about all those POMs that busily exclude transitive dependencies. Once all those JARs are modules, that won’t work – the module system will not let you launch without all dependencies present.

Generally speaking, the module system makes it harder to go against the maintainers’ decisions. Making internal APIs available via reflection or altering dependencies now goes against the grain of a mechanism that is built deeply into the compiler and JVM. There are of course a number of command line flags to affect the module system but they don’t cover everything. To come back to exclusing dependencies, maybe–ignore-missing-modules ${modules} would be a good idea…

Regarding adoption rate, I expect it to be slower than Java 8. But leaving those projects aside that see every new version as insurmountable and are still on Java 6, I’m sure the vast majority will migrate eventually. If not for Java 9’s features than surely for future ones. As a friend and colleague once said: “I’ll do everything to get to value types.”

Now that Java 9 is out and “legacy”, what Java projects will you cover next in your blog and your work?

Oh boy, I’m still busy with Java 9. First I have to finish the book (November hopefully) and then I want to do a few more migrations because I actually like doing that for some weird and maybe not entirely healthy reason (the things you see…). FYI, I’m for hire, so if readers are stuck with their migration they should reach out.

Beyond that, I’m already looking forward to primitive specialization, e.g. ArrayList<int>, and value types (both from Project Valhalla) as well as the changes Project Amber will bring to Java. I’m sure I’ll start discussing those in 2018.

Another thing I’ll keep myself busy with and which I would love your readers to check out is my YouTube channel. It’s still very young and until the book’s done I won’t do a lot of videos (hope to record one next week), but I’m really thrilled about the whole endavour!

Are Java 8 Streams Truly Lazy? Not Completely!

In a recent article, I’ve shown that programmers should always apply a filter first, map later strategy with streams. The example I made there was this one:

    .map(e -> superExpensiveMapping(e))

In this case, the limit() operation implements the filtering, which should take place before the mapping.

Several readers correctly mentioned that in this case, it doesn’t matter what order we’re putting the limit() and map() operations, because most operations are evaluated lazily in the Java 8 Stream API.

Or rather: The collect() terminal operation pulls values from the stream lazily, and as the limit(5) operation reaches the end, it will no longer produce new values, regardless whether map() came before or after. This can be proven easily as follows:

import java.util.stream.Stream;

public class LazyStream {
    public static void main(String[] args) {
        Stream.iterate(0, i -> i + 1)
              .map(i -> i + 1)
              .peek(i -> System.out.println("Map: " + i))
              .forEach(i -> {});


        Stream.iterate(0, i -> i + 1)
              .map(i -> i + 1)
              .peek(i -> System.out.println("Map: " + i))
              .forEach(i -> {});

The output of the above is:

Map: 1
Map: 2
Map: 3
Map: 4
Map: 5

Map: 1
Map: 2
Map: 3
Map: 4
Map: 5

But this isn’t always the case!

This optimisation is an implementation detail, and in general, it is not unwise to really apply the filter first, map later rule thoroughly, not relying on such an optimisation. In particular, the Java 8 implementation of flatMap() is not lazy. Consider the following logic, where we put a flatMap() operation in the middle of the stream:

import java.util.stream.Stream;

public class LazyStream {
    public static void main(String[] args) {
        Stream.iterate(0, i -> i + 1)
              .flatMap(i -> Stream.of(i, i, i, i))
              .map(i -> i + 1)
              .peek(i -> System.out.println("Map: " + i))
              .forEach(i -> {});


        Stream.iterate(0, i -> i + 1)
              .flatMap(i -> Stream.of(i, i, i, i))
              .map(i -> i + 1)
              .peek(i -> System.out.println("Map: " + i))
              .forEach(i -> {});

The result is now:

Map: 1
Map: 1
Map: 1
Map: 1
Map: 2
Map: 2
Map: 2
Map: 2

Map: 1
Map: 1
Map: 1
Map: 1
Map: 2

So, the first Stream pipeline will map all the 8 flatmapped values prior to applying the limit, whereas the second Stream pipeline really limits the stream to 5 elements first, and then maps only those.

The reason for this is in the flatMap() implementation:

// In ReferencePipeline.flatMap()
try (Stream<? extends R> result = mapper.apply(u)) {
    if (result != null)

As you can see, the result of the flatMap() operation is consumed eagerly with a terminal forEach() operation, which will always produce all the four values in our case and send them to the next operation. So, flatMap() isn’t lazy, and thus the next operation after it will get all of its results. This is true for Java 8. Future Java versions might improve this, of course.

We better filter them first. And map later.

Update: flatMap() gets fixed in JDK 10

Thanks, Tagir Valeev, for pointing out that there’s a fix coming up:

Relevant links:


A Nice API Design Gem: Strategy Pattern With Lambdas

With Java 8 lambdas being available to us as a programming tool, there is a “new” and elegant way of constructing objects. I put “new” in quotes, because it’s not new. It used to be called the strategy pattern, but as I’ve written on this blog before, many GoF patterns will no longer be implemented in their classic OO way, now that we have lambdas.

A simple example from jOOQ

jOOQ knows a simple type called Converter. It’s a simple SPI, which allows users to implement custom data types and inject data type conversion into jOOQ’s type system. The interface looks like this:

public interface Converter<T, U> {
    U from(T databaseObject);
    T to(U userObject);
    Class<T> fromType();
    Class<U> toType();

Users will have to implement 4 methods:

  • Conversion from a database (JDBC) type T to the user type U
  • Conversion from the user type U to the database (JDBC) type T
  • Two methods providing a Class reference, to work around generic type erasure

Now, an implementation that converts hex strings (database) to integers (user type):

public class HexConverter implements Converter<String, Integer> {

    public Integer from(String hexString) {
        return hexString == null 
            ? null 
            : Integer.parseInt(hexString, 16);

    public String to(Integer number) {
        return number == null 
            ? null 
            : Integer.toHexString(number);

    public Class<String> fromType() {
        return String.class;

    public Class<Integer> toType() {
        return Integer.class;

That wasn’t difficult to write, but it’s quite boring to write this much boilerplate:

  • Why do we need to give this class a name?
  • Why do we need to override methods?
  • Why do we need to handle nulls ourselves?

Now, we could write some object oriented libraries, e.g. abstract base classes that take care at least of the fromType() and toType() methods, but much better: The API designer can provide a “constructor API”, which allows users to provide “strategies”, which is just a fancy name for “function”. One function (i.e. lambda) for each of the four methods. For example:

public interface Converter<T, U> {

    static <T, U> Converter<T, U> of(
        Class<T> fromType,
        Class<U> toType,
        Function<? super T, ? extends U> from,
        Function<? super U, ? extends T> to
    ) {
        return new Converter<T, U>() { ... boring code here ... }

    static <T, U> Converter<T, U> ofNullable(
        Class<T> fromType,
        Class<U> toType,
        Function<? super T, ? extends U> from,
        Function<? super U, ? extends T> to
    ) {
        return of(

            // Boring null handling code here
            t -> t == null ? null : from.apply(t),
            u -> u == null ? null : to.apply(u)

From now on, we can easily write converters in a functional way. For example, our HexConverter would become:

Converter<String, Integer> converter =
    s -> Integer.parseInt(s, 16),

Wow! This is really nice, isn’t it? This is the pure essence of what it means to write a Converter. No more overriding, null handling, type juggling, just the bidirectional conversion logic.

Other examples

A more famous example is the JDK 8 Collector.of() constructor, without which it would be much more tedious to implement a collector. For example, if we want to find the second largest element in a stream… easy!

for (int i : Stream.of(1, 8, 3, 5, 6, 2, 4, 7)
    () -> new int[] { Integer.MIN_VALUE, Integer.MIN_VALUE },
    (a, t) -> {
        if (a[0] < t) {
            a[1] = a[0];
            a[0] = t;
        else if (a[1] < t)
            a[1] = t;
    (a1, a2) -> {
        throw new UnsupportedOperationException(
            "Say no to parallel streams");

Run this, and you get:


Bonus exercise: Make the collector parallel capable by implementing the combiner correctly. In a sequential-only scenario, we don’t need it (until we do, of course…).


The concrete examples are nice examples of API usage, but the key message is this:

If you have an interface of the form:

interface MyInterface {
    void myMethod1();
    String myMethod2();
    void myMethod3(String value);
    String myMethod4(String value);

Then, just add a convenience constructor to the interface, accepting Java 8 functional interfaces like this:

// You write this boring stuff
interface MyInterface {
    static MyInterface of(
        Runnable function1,
        Supplier<String> function2,
        Consumer<String> function3,
        Function<String, String> function4
    ) {
        return new MyInterface() {
            public void myMethod1() {

            public String myMethod2() {
                return function2.get();

            public void myMethod3(String value) {

            public String myMethod4(String value) {
                return function4.apply(value);

As an API designer, you write this boilerplate only once. And your users can then easily write things like these:

// Your users write this awesome stuff
    () -> { ... },
    () -> "hello",
    v -> { ... },
    v -> "world"

Easy! And your users will love you forever for this.