Correct Reflective Access to Interface Default Methods in Java 8, 9, 10

When performing reflective access to default methods in Java, Google seems to fail us. The solutions presented on Stack Overflow, for instance, seem to work only in a certain set of cases, and not on all Java versions.

This article will illustrate different approaches to calling interface default methods through reflection, as may be required by a proxy, for instance.

TL;DR If you’re impatient, all the access methods exposed in this blog are available in this gist, and the problem is also fixed in our library jOOR.

Proxying interfaces with default methods

The useful java.lang.reflect.Proxy API has been around for a while. We can do cool things like:

import java.lang.reflect.Proxy;

public class ProxyDemo {
    interface Duck {
        void quack();
    }

    public static void main(String[] a) {
        Duck duck = (Duck) Proxy.newProxyInstance(
            Thread.currentThread().getContextClassLoader(),
            new Class[] { Duck.class },
            (proxy, method, args) -> {
                System.out.println("Quack");
                return null;
            }
        );

        duck.quack();
    }
}

This just yields:

Quack

In this example, we create a proxy instance that implements the Duck API through an InvocationHandler, which is essentially just a lambda that gets called for each method call on Duck.

The interesting bit is when we want to have a default method on Duck and delegate the call to that default method:

interface Duck {
    default void quack() {
        System.out.println("Quack");
    }
}

We might be inclined to write this:

import java.lang.reflect.Proxy;

public class ProxyDemo {
    interface Duck {
        default void quack() {
            System.out.println("Quack");
        }
    }

    public static void main(String[] a) {
        Duck duck = (Duck) Proxy.newProxyInstance(
            Thread.currentThread().getContextClassLoader(),
            new Class[] { Duck.class },
            (proxy, method, args) -> {
                method.invoke(proxy);
                return null;
            }
        );

        duck.quack();
    }
}

But this will just generate a long long stack trace of nested exceptions (this isn’t specific to the method being a default method. You simply cannot do this):

Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
	at $Proxy0.quack(Unknown Source)
	at ProxyDemo.main(ProxyDemo.java:20)
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at ProxyDemo.lambda$0(ProxyDemo.java:15)
	... 2 more
Caused by: java.lang.reflect.UndeclaredThrowableException
	at $Proxy0.quack(Unknown Source)
	... 7 more
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at ProxyDemo.lambda$0(ProxyDemo.java:15)
	... 8 more
Caused by: java.lang.reflect.UndeclaredThrowableException
	at $Proxy0.quack(Unknown Source)
	... 13 more
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at ProxyDemo.lambda$0(ProxyDemo.java:15)
	... 14 more
Caused by: java.lang.reflect.UndeclaredThrowableException
	at $Proxy0.quack(Unknown Source)
	... 19 more
...
...
... goes on forever

Not very helpful.

Using method handles

So, the original Google search turned up results that indicate we need to use the MethodHandles API. Let’s try that, then!

import java.lang.invoke.MethodHandles;
import java.lang.reflect.Proxy;

public class ProxyDemo {
    interface Duck {
        default void quack() {
            System.out.println("Quack");
        }
    }

    public static void main(String[] a) {
        Duck duck = (Duck) Proxy.newProxyInstance(
            Thread.currentThread().getContextClassLoader(),
            new Class[] { Duck.class },
            (proxy, method, args) -> {
                MethodHandles
                    .lookup()
                    .in(Duck.class)
                    .unreflectSpecial(method, Duck.class)
                    .bindTo(proxy)
                    .invokeWithArguments();
                return null;
            }
        );

        duck.quack();
    }
}

That seems to work, cool!

Quack

… until it doesn’t.

Calling a default method on a non-private-accessible interface

The interface in the above example was carefully chosen to be “private-accessible” by the caller, i.e. the interface is nested in the caller’s class. What if we had a top-level interface?

import java.lang.invoke.MethodHandles;
import java.lang.reflect.Proxy;

interface Duck {
    default void quack() {
        System.out.println("Quack");
    }
}

public class ProxyDemo {
    public static void main(String[] a) {
        Duck duck = (Duck) Proxy.newProxyInstance(
            Thread.currentThread().getContextClassLoader(),
            new Class[] { Duck.class },
            (proxy, method, args) -> {
                MethodHandles
                    .lookup()
                    .in(Duck.class)
                    .unreflectSpecial(method, Duck.class)
                    .bindTo(proxy)
                    .invokeWithArguments();
                return null;
            }
        );

        duck.quack();
    }
}

The almost same code snippet no longer works. We get the following IllegalAccessException:

Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
	at $Proxy0.quack(Unknown Source)
	at ProxyDemo.main(ProxyDemo.java:26)
Caused by: java.lang.IllegalAccessException: no private access for invokespecial: interface Duck, from Duck/package
	at java.lang.invoke.MemberName.makeAccessException(MemberName.java:850)
	at java.lang.invoke.MethodHandles$Lookup.checkSpecialCaller(MethodHandles.java:1572)
	at java.lang.invoke.MethodHandles$Lookup.unreflectSpecial(MethodHandles.java:1231)
	at ProxyDemo.lambda$0(ProxyDemo.java:19)
	... 2 more

Bummer. When googling further, we might find the following solution, which accesses MethodHandles.Lookup‘s internals through reflection:

import java.lang.invoke.MethodHandles.Lookup;
import java.lang.reflect.Constructor;
import java.lang.reflect.Proxy;

interface Duck {
    default void quack() {
        System.out.println("Quack");
    }
}

public class ProxyDemo {
    public static void main(String[] a) {
        Duck duck = (Duck) Proxy.newProxyInstance(
            Thread.currentThread().getContextClassLoader(),
            new Class[] { Duck.class },
            (proxy, method, args) -> {
                Constructor<Lookup> constructor = Lookup.class
                    .getDeclaredConstructor(Class.class);
                constructor.setAccessible(true);
                constructor.newInstance(Duck.class)
                    .in(Duck.class)
                    .unreflectSpecial(method, Duck.class)
                    .bindTo(proxy)
                    .invokeWithArguments();
                return null;
            }
        );

        duck.quack();
    }
}

And yay, we get:

Quack

We get that on JDK 8. What about JDK 9 or 10?

WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by ProxyDemo (file:/C:/Users/lukas/workspace/playground/target/classes/) to constructor java.lang.invoke.MethodHandles$Lookup(java.lang.Class)
WARNING: Please consider reporting this to the maintainers of ProxyDemo
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Quack

Oops. That’s what happens by default. If we run the program with the --illegal-access=deny flag:

java --illegal-access=deny ProxyDemo

Then, we’re getting (and rightfully so):

Exception in thread "main" java.lang.reflect.InaccessibleObjectException: Unable to make java.lang.invoke.MethodHandles$Lookup(java.lang.Class) accessible: module java.base does not "opens java.lang.invoke" to unnamed module @357246de
        at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:337)
        at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:281)
        at java.base/java.lang.reflect.Constructor.checkCanSetAccessible(Constructor.java:192)
        at java.base/java.lang.reflect.Constructor.setAccessible(Constructor.java:185)
        at ProxyDemo.lambda$0(ProxyDemo.java:18)
        at $Proxy0.quack(Unknown Source)
        at ProxyDemo.main(ProxyDemo.java:28)

One of the Jigsaw project’s goals is to precisely not allow such hacks to persist. So, what’s a better solution? This?

import java.lang.invoke.MethodHandles;
import java.lang.invoke.MethodType;
import java.lang.reflect.Proxy;

interface Duck {
    default void quack() {
        System.out.println("Quack");
    }
}

public class ProxyDemo {
    public static void main(String[] a) {
        Duck duck = (Duck) Proxy.newProxyInstance(
            Thread.currentThread().getContextClassLoader(),
            new Class[] { Duck.class },
            (proxy, method, args) -> {
                MethodHandles.lookup()
                    .findSpecial( 
                         Duck.class, 
                         "quack",  
                         MethodType.methodType( 
                             void.class, 
                             new Class[0]),  
                         Duck.class)
                    .bindTo(proxy)
                    .invokeWithArguments();
                return null;
            }
        );

        duck.quack();
    }
}
Quack

Great, it works in Java 9 and 10, what about Java 8?

Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
	at $Proxy0.quack(Unknown Source)
	at ProxyDemo.main(ProxyDemo.java:25)
Caused by: java.lang.IllegalAccessException: no private access for invokespecial: interface Duck, from ProxyDemo
	at java.lang.invoke.MemberName.makeAccessException(MemberName.java:850)
	at java.lang.invoke.MethodHandles$Lookup.checkSpecialCaller(MethodHandles.java:1572)
	at java.lang.invoke.MethodHandles$Lookup.findSpecial(MethodHandles.java:1002)
	at ProxyDemo.lambda$0(ProxyDemo.java:18)
	... 2 more

You’re kidding, right?

So, there’s a solution (hack) that works on Java 8 but not on 9 or 10, and there’s a solution that works on Java 9 and 10, but not on Java 8.

A more thorough examination

So far, I’ve just been trying to run different things on different JDKs. The following class tries all combinations. It’s also available in this gist here.

Compile it with JDK 9 or 10 (because it also tries using JDK 9+ API: MethodHandles.privateLookupIn()), but compile it using this command, so you can also run the class on JDK 8:

javac -source 1.8 -target 1.8 CallDefaultMethodThroughReflection.java
import java.lang.invoke.MethodHandles;
import java.lang.invoke.MethodHandles.Lookup;
import java.lang.invoke.MethodType;
import java.lang.reflect.Constructor;
import java.lang.reflect.Method;
import java.lang.reflect.Proxy;


interface PrivateInaccessible {
    default void quack() {
        System.out.println(" -> PrivateInaccessible.quack()");
    }
}

public class CallDefaultMethodThroughReflection {
    interface PrivateAccessible {
        default void quack() {
            System.out.println(" -> PrivateAccessible.quack()");
        }
    }

    public static void main(String[] args) {
        System.out.println("PrivateAccessible");
        System.out.println("-----------------");
        System.out.println();
        proxy(PrivateAccessible.class).quack();

        System.out.println();
        System.out.println("PrivateInaccessible");
        System.out.println("-------------------");
        System.out.println();
        proxy(PrivateInaccessible.class).quack();
    }

    private static void quack(Lookup lookup, Class<?> type, Object proxy) {
        System.out.println("Lookup.in(type).unreflectSpecial(...)");

        try {
            lookup.in(type)
                  .unreflectSpecial(type.getMethod("quack"), type)
                  .bindTo(proxy)
                  .invokeWithArguments();
        }
        catch (Throwable e) {
            System.out.println(" -> " + e.getClass() + ": " + e.getMessage());
        }

        System.out.println("Lookup.findSpecial(...)");
        try {
            lookup.findSpecial(type, "quack", MethodType.methodType(void.class, new Class[0]), type)
                  .bindTo(proxy)
                  .invokeWithArguments();
        }
        catch (Throwable e) {
            System.out.println(" -> " + e.getClass() + ": " + e.getMessage());
        }
    }

    @SuppressWarnings("unchecked")
    private static <T> T proxy(Class<T> type) {
        return (T) Proxy.newProxyInstance(
            Thread.currentThread().getContextClassLoader(),
            new Class[] { type },
            (Object proxy, Method method, Object[] arguments) -> {
                System.out.println("MethodHandles.lookup()");
                quack(MethodHandles.lookup(), type, proxy);

                try {
                    System.out.println();
                    System.out.println("Lookup(Class)");
                    Constructor<Lookup> constructor = Lookup.class.getDeclaredConstructor(Class.class);
                    constructor.setAccessible(true);
                    constructor.newInstance(type);
                    quack(constructor.newInstance(type), type, proxy);
                }
                catch (Exception e) {
                    System.out.println(" -> " + e.getClass() + ": " + e.getMessage());
                }

                try {
                    System.out.println();
                    System.out.println("MethodHandles.privateLookupIn()");
                    quack(MethodHandles.privateLookupIn(type, MethodHandles.lookup()), type, proxy);
                }
                catch (Error e) {
                    System.out.println(" -> " + e.getClass() + ": " + e.getMessage());
                }

                return null;
            }
        );
    }
}

The output of the above program is:

Java 8

$ java -version
java version "1.8.0_141"
Java(TM) SE Runtime Environment (build 1.8.0_141-b15)
Java HotSpot(TM) 64-Bit Server VM (build 25.141-b15, mixed mode)

$ java CallDefaultMethodThroughReflection
PrivateAccessible
-----------------

MethodHandles.lookup()
Lookup.in(type).unreflectSpecial(...)
 -> PrivateAccessible.quack()
Lookup.findSpecial(...)
 -> class java.lang.IllegalAccessException: no private access for invokespecial: interface CallDefaultMethodThroughReflection$PrivateAccessible, from CallDefaultMethodThroughReflection

Lookup(Class)
Lookup.in(type).unreflectSpecial(...)
 -> PrivateAccessible.quack()
Lookup.findSpecial(...)
 -> PrivateAccessible.quack()

MethodHandles.privateLookupIn()
 -> class java.lang.NoSuchMethodError: java.lang.invoke.MethodHandles.privateLookupIn(Ljava/lang/Class;Ljava/lang/invoke/MethodHandles$Lookup;)Ljava/lang/invoke/MethodHandles$Lookup;

PrivateInaccessible
-------------------

MethodHandles.lookup()
Lookup.in(type).unreflectSpecial(...)
 -> class java.lang.IllegalAccessException: no private access for invokespecial: interface PrivateInaccessible, from PrivateInaccessible/package
Lookup.findSpecial(...)
 -> class java.lang.IllegalAccessException: no private access for invokespecial: interface PrivateInaccessible, from CallDefaultMethodThroughReflection

Lookup(Class)
Lookup.in(type).unreflectSpecial(...)
 -> PrivateInaccessible.quack()
Lookup.findSpecial(...)
 -> PrivateInaccessible.quack()

MethodHandles.privateLookupIn()
 -> class java.lang.NoSuchMethodError: java.lang.invoke.MethodHandles.privateLookupIn(Ljava/lang/Class;Ljava/lang/invoke/MethodHandles$Lookup;)Ljava/lang/invoke/MethodHandles$Lookup;

Java 9

$ java -version
java version "9.0.4"
Java(TM) SE Runtime Environment (build 9.0.4+11)
Java HotSpot(TM) 64-Bit Server VM (build 9.0.4+11, mixed mode)

$ java --illegal-access=deny CallDefaultMethodThroughReflection
PrivateAccessible
-----------------

MethodHandles.lookup()
Lookup.in(type).unreflectSpecial(...)
 -> PrivateAccessible.quack()
Lookup.findSpecial(...)
 -> PrivateAccessible.quack()

Lookup(Class)
 -> class java.lang.reflect.InaccessibleObjectException: Unable to make java.lang.invoke.MethodHandles$Lookup(java.lang.Class) accessible: module java.base does not "opens java.lang.invoke" to unnamed module @30c7da1e

MethodHandles.privateLookupIn()
Lookup.in(type).unreflectSpecial(...)
 -> PrivateAccessible.quack()
Lookup.findSpecial(...)
 -> PrivateAccessible.quack()

PrivateInaccessible
-------------------

MethodHandles.lookup()
Lookup.in(type).unreflectSpecial(...)
 -> class java.lang.IllegalAccessException: no private access for invokespecial: interface PrivateInaccessible, from PrivateInaccessible/package (unnamed module @30c7da1e)
Lookup.findSpecial(...)
 -> PrivateInaccessible.quack()

Lookup(Class)
 -> class java.lang.reflect.InaccessibleObjectException: Unable to make java.lang.invoke.MethodHandles$Lookup(java.lang.Class) accessible: module java.base does not "opens java.lang.invoke" to unnamed module @30c7da1e

MethodHandles.privateLookupIn()
Lookup.in(type).unreflectSpecial(...)
 -> PrivateInaccessible.quack()
Lookup.findSpecial(...)
 -> PrivateInaccessible.quack()

Java 10

$ java -version
java version "10" 2018-03-20
Java(TM) SE Runtime Environment 18.3 (build 10+46)
Java HotSpot(TM) 64-Bit Server VM 18.3 (build 10+46, mixed mode)

$ java --illegal-access=deny CallDefaultMethodThroughReflection
... same result as in Java 9

Conclusion

Getting this right is a bit tricky.

  • In Java 8, the best working approach is the hack that opens up the JDK’s internals by accessing a package-private Lookup constructor. This is the only way to consistently call default methods on both private-accessible and private-inaccessible interfaces from any location.
  • In Java 9 and 10, the best working approaches are Lookup.findSpecial() (didn’t work in Java 8) or the new MethodHandles.privateLookupIn() (didn’t exist in in Java 8). The latter is required in case the interfaced is located in another module. That module will still need to open the interface’s package to the caller.

It’s fair to say that this is a bit of a mess. The appropriate meme here is:

According to Rafael Winterhalter (author of ByteBuddy), the “real” fix should go into a revised Proxy API:

I’m not sure if that would solve all the problems, but it should definitely be the case that an implementor shouldn’t worry about all of the above.

Also, clearly, this article didn’t do the complete work, e.g. of testing whether the approaches still work if Duck is imported from another module:

… which will be a topic of another blog post.

Using jOOR

If you’re using jOOR (our reflection library, check it out here), the upcoming version 0.9.8 will include a fix for this:
https://github.com/jOOQ/jOOR/issues/49

The fix simply uses the unsafe reflection approach in Java 8, or the MethodHandles.privateLookupIn() approach in Java 9+. You can then write:

Reflect.on(new Object()).as(PrivateAccessible.class).quack();
Reflect.on(new Object()).as(PrivateInaccessible.class).quack();

Top 10 SQL Dialect Emulations Implemented in jOOQ

The SQL standard is a nice thing. But it’s a standard. And as such, while it may provide good guidelines on how to implement some SQL features, most dialects deviate from it in one way or another (sometimes drastically, cheers, MySQL).

But that doesn’t have to be a bad thing. Innovation is not driven by standards, but by individual vendors’ attempts to tackle a problem from a different perspective. And then, sometimes, the innovation becomes the standard. One example for this is Oracle’s very very interesting MATCH_RECOGNIZE feature, on which Markus Winand did an excellent presentation.

Other functionality is not standardised, such as Oracle/SQL Server’s PIVOT and UNPIVOT.

In many cases, vendor-specific functionality can be translated to equivalent standard SQL, or to other vendor-specific SQL. That’s one of jOOQ’s core features: The “standardisation” of currently 21 SQL dialects into a single Java API. Since jOOQ 3.9, the Java API can also be hidden behind a parser, which makes visualising the differences much simpler. If you want to play around with the following examples, do visit https://www.jooq.org/translate to see our online SQL translator in action!

Here are Top 10 SQL Dialect Emulations Implemented in jOOQ:

1. Empty FROM clause

The SQL standard doesn’t allow this, but many databases do. You have to specify a FROM clause in at least these databases

  • Access
  • CUBRID
  • DB2
  • Derby
  • Firebird
  • HANA
  • HSQLDB
  • Informix
  • Ingres
  • MariaDB
  • MySQL (not always)
  • Oracle
  • Sybase SQL Anywhere

These ones don’t really need a FROM clause:

  • H2
  • PostgreSQL
  • Redshift
  • SQL Server
  • SQLite
  • Sybase ASE
  • Vertica

An example of such a query could be the following query that retrieves the server timestamp:

SELECT current_timestamp

In Oracle, you’d have to add the DUAL table:

SELECT current_timestamp FROM dual

There are other possibilities to emulate this in other databases. If you want to see how jOOQ does it, again, either visit our online SQL translator, or run this little code snippet yourself, locally (be sure to report any issues you may find, greatly appreciated!):

import org.jooq.Query;
import org.jooq.SQLDialect;
import org.jooq.impl.DSL;

public class Translate {
    public static void main(String[] args) {
        Query query = DSL.using(SQLDialect.DEFAULT)
            .parser()
            .parseQuery("SELECT current_timestamp");

        for (SQLDialect family : SQLDialect.families()) {
            System.out.println(String.format(
                "%1$-13s: %2$s", family, 
                DSL.using(family).render(query)
            ));
    }
}

So, running the above snippet will yield (and observe, free current_timestamp translation:

ACCESS    : select now() from (select count(*) dual from MSysResources) as dual
ASE       : select current_bigdatetime()
CUBRID    : select current_timestamp() from "db_root"
DB2       : select current_timestamp from "SYSIBM"."DUAL"
DERBY     : select current_timestamp from "SYSIBM"."SYSDUMMY1"
FIREBIRD  : select current_timestamp from "RDB$DATABASE"
H2        : select current_timestamp()
HANA      : select current_timestamp from "SYS"."DUMMY"
HSQLDB    : select current_timestamp from (
              select 1 as dual from information_schema.system_users limit 1
            ) as dual
INFORMIX  : select current from (
              select 1 as dual from systables where tabid = 1
            ) as dual
INGRES    : select current_timestamp from (select 1 as dual) as dual
MARIADB   : select current_timestamp() from dual
MYSQL     : select current_timestamp() from dual
ORACLE    : select current_timestamp from dual
POSTGRES  : select current_timestamp
REDSHIFT  : select current_timestamp
SQLITE    : select current_timestamp
SQLSERVER : select current_timestamp
SYBASE    : select current timestamp from [SYS].[DUMMY]
VERTICA   : select current_timestamp

See also the jOOQ manual’s section about the dual table.

2. LIMIT .. OFFSET

First off, don’t use OFFSET.

Since you didn’t listen and you’re still using OFFSET, let’s discuss how to emulate it (and the much more useful LIMIT in many database dialects.

The SQL:2016 standard syntax is:

<result offset clause> ::=
  OFFSET <offset row count> { ROW | ROWS }

<fetch first clause> ::=
  FETCH { FIRST | NEXT } [ <fetch first quantity> ]
    { ROW | ROWS } { ONLY | WITH TIES }

<fetch first quantity> ::=
    <fetch first row count>
  | <fetch first percentage>

<offset row count> ::=
  <simple value specification>

<fetch first row count> ::=
  <simple value specification>

<fetch first percentage> ::=
  <simple value specification> PERCENT

So, there are a few interesting features:

  • The OFFSET (which is the least interesting)
  • The number of rows to FETCH
  • Whether tied rows should be fetched, too (TIES). This will be covered in the next section
  • Whether the number of rows is really a PERCENTage

Oracle currently is the only database (I’m aware of) that does it all and with standard syntax.

FETCH without OFFSET

For instance, when querying the Sakila database, we can get the TOP 3 longest films:

SELECT film_id, title, length
FROM film 
ORDER BY length DESC 
FETCH NEXT 3 ROWS ONLY

Yielding:

FILM_ID  TITLE           LENGTH
-------------------------------
212      DARN FORRESTER  185
182      CONTROL ANTHEM  185
141      CHICAGO NORTH   185

(In the next section we’ll look at the WITH TIES clause to find the other films of length 185)

But what do these queries look like in other databases? Here’s the translation of the ROWS ONLY query, according to jOOQ:

ACCESS    : select top 3 film_id, title, length from film order by length desc
ASE       : select top 3 film_id, title, length from film order by length desc
CUBRID    : select film_id, title, length from film 
              order by length desc limit 0, 3
DB2       : select film_id, title, length from film 
              order by length desc fetch first 3 rows only
DERBY     : select film_id, title, length from film 
              order by length desc offset 0 rows fetch next 3 rows only
FIREBIRD  : select film_id, title, length from film 
              order by length desc rows (0 + 1) to (0 + 3)
H2        : select film_id, title, length from film order by length desc limit 3
HANA      : select film_id, title, length from film order by length desc limit 3
HSQLDB    : select film_id, title, length from film order by length desc limit 3
INFORMIX  : select first 3 film_id, title, length from film order by length desc
INGRES    : select film_id, title, length from film 
              order by length desc offset 0 fetch first 3 rows only
MARIADB   : select film_id, title, length from film order by length desc limit 3
MYSQL     : select film_id, title, length from film order by length desc limit 3
ORACLE    : select film_id, title, length from film 
              order by length desc fetch next 3 rows only
POSTGRES  : select film_id, title, length from film order by length desc limit 3
REDSHIFT  : select film_id, title, length from film order by length desc limit 3
SQLITE    : select film_id, title, length from film order by length desc limit 3
SQLSERVER : select top 3 film_id, title, length from film order by length desc
SYBASE    : select top 3 film_id, title, length from film order by length desc
VERTICA   : select film_id, title, length from film order by length desc limit 3

So, there are essentially 3 families:

  • The standard family using FETCH, including DB2 (doesn’t support OFFSET), Derby, Ingres (although missing a keyword), Oracle
  • The MySQL family using LIMIT, including CUBRID, H2, HANA, HSQLDB, MariaDB, MySQL, PostgreSQL, Redshift, SQLite, Vertica
  • The T-SQL family using TOP, inculding Access, ASE, SQL Server, Sybase

There are also exotic syntaxes implemented by Firebird and Informix.

FETCH with OFFSET

You’ll find tons of blog posts on the web on how to emulate OFFSET .. LIMIT, including jOOQ’s manual. Things do get a bit more tricky when adding an offset, as can be seen here:

CUBRID    : select film_id, title, length from film
              order by length desc limit 3, 3
DB2       : select "v0" film_id, "v1" title, "v2" length from (
              select 
                film_id "v0", title "v1", length "v2", 
                row_number() over (order by length desc) "rn" 
              from film order by "v2" desc
            ) "x" 
            where "rn" > 3 and "rn" <= (3 + 3) 
            order by "rn"
DERBY     : select film_id, title, length from film 
              order by length desc offset 3 rows fetch next 3 rows only
FIREBIRD  : select film_id, title, length from film 
              order by length desc rows (3 + 1) to (3 + 3)
H2        : select film_id, title, length from film 
              order by length desc limit 3 offset 3
HANA      : select film_id, title, length from film 
              order by length desc limit 3 offset 3
HSQLDB    : select film_id, title, length from film 
              order by length desc limit 3 offset 3
INFORMIX  : select skip 3 first 3 film_id, title, length from film 
              order by length desc
INGRES    : select film_id, title, length from film 
              order by length desc offset 3 fetch first 3 rows only
MARIADB   : select film_id, title, length from film 
              order by length desc limit 3 offset 3
MYSQL     : select film_id, title, length from film 
              order by length desc limit 3 offset 3
ORACLE    : select film_id, title, length from film 
              order by length desc offset 3 rows fetch next 3 rows only
POSTGRES  : select film_id, title, length from film 
              order by length desc limit 3 offset 3
REDSHIFT  : select film_id, title, length from film 
              order by length desc limit 3 offset 3
SQLITE    : select film_id, title, length from film 
              order by length desc limit 3 offset 3
SQLSERVER : select film_id, title, length from film 
              order by length desc offset 3 rows fetch next 3 rows only
SYBASE    : select top 3 start at 4 film_id, title, length from film 
              order by length desc
VERTICA   : select film_id, title, length from film 
              order by length desc limit 3 offset 3

Interesting to note:

  • MS Access, and Sybase ASE do not support offsets at all (maybe a good thing).
  • The more recent versions of SQL Server support the SQL standard OFFSET .. FETCH clause (although OFFSET, unfortunately, is mandatory), which is great. Older versions can emulate OFFSET just like DB2 below
  • Sybase SQL Anywhere enhanced the T-SQL TOP syntax to something intuitive: TOP .. START AT. Why not?
  • DB2 doesn’t support the syntax, and we have to emulate it using ROW_NUMBER() window functions:
    select "v0" film_id, "v1" title, "v2" length from (
      select 
        film_id "v0", title "v1", length "v2", 
        row_number() over (order by length desc) "rn" 
      from film order by "v2" desc
    ) "x" 
    where "rn" > 3 and "rn" &lt;= (3 + 3) 
    order by "rn"
    

    Notice how, over the years, we’ve learned to do it right and prevent all sorts of side-effects from wrong emulations:

    • In the nested query, all columns have to be renamed to some enumerated column names to prevent problems from possibly duplicate column names in the user SQL query. It is perfectly OK for top-level SELECT statements to have duplicate / ambiguous column names, but not for subqueries
    • The top level SELECT statement should not project the auxiliary ROW_NUMBER() value. While this might not be causing trouble in ordinary queries, it is certainly causing trouble in subqueries. Imagine emulating something like:
      WHERE x IN (
        SELECT id
        FROM table
        OFFSET 1 ROW
        FETCH NEXT ROW ONLY
      )
      

      In this case, we must be very careful that the subquery continues to project only exactly one column.

3. WITH TIES

The previous approach to getting TOP 3 films is dangerous, because the ranking is rather random. There are other films of length 185, and they didn’t make it into the TOP 3. We could add another ordering criteria to make the ordering deterministic (e.g. FILM_ID), or we can use WITH TIES to get 3 films and all the tied films, too.

The query is now:

SELECT film_id, title, length
FROM film 
ORDER BY length DESC 
FETCH NEXT 3 ROWS WITH TIES

And we’re getting:

FILM_ID  TITLE               LENGTH
-----------------------------------
212      DARN FORRESTER	     185
872      SWEET BROTHERHOOD   185
817      SOLDIERS EVOLUTION  185
991      WORST BANGER        185
690      POND SEATTLE        185
609      MUSCLE BRIGHT       185
349      GANGS PRIDE         185
426      HOME PITY           185
182      CONTROL ANTHEM      185
141      CHICAGO NORTH       185

There are no more films of length 185 than the above. For more information about doing TOP N SQL queries, see this blog post.

For the sake of simplicity, let’s remove again the OFFSET clause (because it behaves inconsistently when combined with WITH TIES, at least in Oracle). Let’s look at WITH TIES only. jOOQ currently doesn’t emulate this clause for all dialects as it is hard to get right without window functions.

DB2       : select "v0" film_id, "v1" title, "v2" length from (
              select 
                film_id "v0", title "v1", length "v2", 
                rank() over (order by length desc) "rn"
              from film
            ) "x" 
            where "rn" > 0 and "rn" <= (0 + 3) 
            order by "rn"
HANA      : ... see DB2
MYSQL     : ... see DB2
ORACLE    : select film_id, title, length from film 
              order by length desc fetch next 3 rows with ties
POSTGRES  : ... see DB2
REDSHIFT  : ... see DB2
SQLSERVER : select top 3 with ties film_id, title, length from film 
              order by length desc
SYBASE    : ... see DB2

There are 3 ways to implement WITH TIES:

  • Oracle implements the SQL standard
  • SQL Server has a vendor-specific TOP N WITH TIES clause
  • All the others can emulate this feature using window functions

4. Nested set operations

Granted, this isn’t something you might run into every day, but when you need it, it’s really a PITA if your database doesn’t support it. Nested set operations. There are three set operations in SQL and relational algebra:

  • UNION
  • INTERSECT
  • EXCEPT (or MINUS, in Oracle)

All of the above come in two flavours:

  • OP or OP DISTINCT (standard syntax that isn’t implemented in any database)
  • OP ALL (most databases support this only for UNION)

Where ALL turns the set operation into a multiset operation, allowing duplicate results. ALL is fully supported (including on INTERSECT and EXCEPT) in:

  • CUBRID
  • DB2
  • Derby
  • HSQLDB
  • PostgreSQL

Now, the query. What if you want to find all actor names and all customer names, but you don’t want e.g. ADAM GRANT: In PostgreSQL, you could write:

SELECT first_name, last_name
FROM actor
UNION
SELECT first_name, last_name
FROM customer
EXCEPT
SELECT 'ADAM', 'GRANT'
ORDER BY 1, 2

In this case, we can simply hope that all these operators are left-associative, which means we’ll add customers to actors, and then remove ADAM GRANT. In fact, according to the standard, this is the case. But perhaps, not all databases implement things this way, and as soon as you mix in INTERSECT, things change, as INTERSECT has higher operator precedence.

Want to be sure? Put parentheses around the expressions, e.g.

(
  SELECT first_name, last_name
  FROM actor
  UNION
  SELECT first_name, last_name
  FROM customer
)
EXCEPT
SELECT 'ADAM', 'GRANT'
ORDER BY 1, 2

Still valid in PostgreSQL (and if you add FROM dual, and replace EXCEPT by MINUS, then also in Oracle), but won’t work e.g. in MySQL. How can we get this to work in all the databases?

Here’s how:

ASE       : ... like MySQL
CUBRID    : ... like PostgreSQL (plus, add the dual table)
DB2       : ... like PostgreSQL (plus, add the dual table)
DERBY     : select first_name, last_name from (
              select first_name, last_name from (
                select first_name, last_name from actor
              ) x 
              union 
              select first_name, last_name from (
                select first_name, last_name from customer
              ) x
            ) x 
            except 
            select "ADAM", "GRANT" from (
              select 'ADAM', 'GRANT' from "SYSIBM"."SYSDUMMY1"
            )
            x order by 1, 2
H2        : ... like PostgreSQL
HANA      : ... like PostgreSQL (plus, add the dual table)
HSQLDB    : ... like PostgreSQL (plus, add the dual table)
INFORMIX  : ... like PostgreSQL (plus, add the dual table)
INGRES    : ... like PostgreSQL (plus, add the dual table)
MARIADB   : ... like MySQL
MYSQL     : select * from (
              select * from (
                select first_name, last_name from actor
              ) x 
              union 
              select * from (
                select first_name, last_name from customer
              ) x
            ) x
            except 
            select * from (
              select 'ADAM', 'GRANT' from dual
            ) 
            x order by 1, 2
ORACLE    : ... like PostgreSQL (add dual and replace EXCEPT by MINUS)
POSTGRES  : (
              (select first_name, last_name from actor) 
                union 
              (select first_name, last_name from customer)
            ) 
            except (select 'ADAM', 'GRANT') 
            order by 1, 2
REDSHIFT  : 
SQLITE    : ... like MySQL
SQLSERVER : ... like PostgreSQL
SYBASE    : ... like PostgreSQL (plus, add the dual table)
VERTICA   : ... like PostgreSQL

Some observations:

  • Access doesn’t support EXCEPT
  • Firebird has a bit of trouble with these operators – I simply haven’t figured out how to work around them yet. Probably due to some parser bugs
  • PostgreSQL (and many others) get it right
  • MySQL (and a few others) get it wrong, and we have to wrap the various set operation subqueries in derived tables, when suddenly things work well again. This really seems to be just a parser problem, not actually missing functionality. But it’s really a pain if you have to rewrite your SQL manually to the MySQL version
  • Derby is like MySQL, but in addition to lacking parser support for standard SQL nested set operations, it also suffers from these nasty bugs: https://issues.apache.org/jira/browse/DERBY-6983 and https://issues.apache.org/jira/browse/DERBY-6984. Luckily, you have jOOQ to work around all these hassles for you!

5. Derived column lists

A really cool standard feature is called “derived column lists”. It allows for renaming a table AND its columns in one go, for instance in PostgreSQL:

SELECT a, b
FROM (
  SELECT first_name, last_name
  FROM actor
) t(a, b) -- Interesting feature here
WHERE a LIKE 'Z%'

Yielding

A     B
----------
ZERO  CAGE

The utility of this functionality is most important when:

  • You generate SQL dynamically, and perhaps you’re not entirely sure what your derived table’s column names are – just rename them and be sure again
  • You call a table-valued function, i.e. a function that returns a table, and again, you’re not really sure what it’s columns are
  • You simply don’t like the column names of a table. This might not be the most important use-case, as with the above syntax, you have to rename ALL (except in PostgreSQL) the columns, in the right order, and we don’t like to depend on such ordering

Again, not all databases support this feature. So, what to do if they don’t? Use this one weird trick with a UNION ALL subquery to emulate it!

ACCESS    : ... like PostgreSQL
ASE       : ... like PostgreSQL
CUBRID    : ... like PostgreSQL
DB2       : ... like PostgreSQL
DERBY     : ... like PostgreSQL
FIREBIRD  : ... like PostgreSQL
H2        : select a, b from (
              (select null a, null b where 1 = 0) 
               union all 
              (select first_name, last_name from actor)
            ) t 
            where a like 'Z%'
HANA      : ... like H2 (plus, add dual table)
HSQLDB    : ... like PostgreSQL
INFORMIX  : ... like PostgreSQL
INGRES    : ... like H2 (plus, add dual table)
MARIADB   : ... like H2 (plus, add dual table)
MYSQL     : ... like H2 (plus, add dual table)
ORACLE    : ... like H2 (plus, add dual table)
POSTGRES  : select a, b from (
              select first_name, last_name from actor
            ) as t(a, b) 
            where a like 'Z%'
REDSHIFT  : ... like PostgreSQL
SQLITE    : ... like H2
SQLSERVER : ... like PostgreSQL
SYBASE    : ... like PostgreSQL
VERTICA   : ... like PostgreSQL

Not a lot of magic here. Either the database supports the feature, or it doesn’t. If it’s not supported, then the derived table whose columns should be aliased must be prefixed by a zero-row-returning UNION ALL subquery, which defines the column names. Because if you use set operations, then the first subquery defines the column names. Cool, eh?

select a, b from (

  -- Dummy subquery defining the column names
  (select null a, null b where 1 = 0) 
   union all 

  -- Actually interesting subqeury
  (select first_name, last_name from actor)
) t 
where a like 'Z%'

Please, forgive me… Actually, it was all Bill Karwin’s idea.

6. VALUES clause

Did you know that VALUES() is a clause that can be used outside of INSERT statements? Yes. In PostgreSQL, you can just write:

VALUES ('Hello', 'World'), ('Cool', 'eh?')

And you’re getting the following result:

column1  column2
----------------
Hello    World  
Cool     eh?    

Of course, we should never rely on such generated column names, thus again, derived column lists. In PostgreSQL, this is only possible when actually using a derived table in this context:

SELECT *
FROM (
  VALUES ('Hello', 'World'), ('Cool', 'eh?') 
) AS t(a, b)

Do all the databases support this clause? Of course not! But at least, it can be emulated in all databases:

ACCESS    : ... like Oracle
ASE       : ... like PostgreSQL
CUBRID    : ... like PostgreSQL
DB2       : ... like PostgreSQL
DERBY     : ... like PostgreSQL
FIREBIRD  : ... like Sybase SQL Anywhere
H2        : select "v"."c1", "v"."c2" from (
              (select null "c1", null "c2" where 1 = 0) 
               union all 
              (select * from (
                 values ('Hello', 'World'), ('Cool', 'eh?')
               ) "v")
            ) "v"
HANA      : ... like Oracle
HSQLDB    : ... like PostgreSQL
INFORMIX  : ... like Sybase SQL Anywhere
INGRES    : ... like Oracle
MARIADB   : ... like Oracle
MYSQL     : ... like Oracle
ORACLE    : select "v"."c1", "v"."c2" from (
              (select null "c1", null "c2" from dual where 1 = 0) 
               union all 
              (select * from (
                (select 'Hello', 'World' from dual)
                 union all 
                (select 'Cool', 'eh?' from dual)
              ) "v")
            ) "v"
POSTGRES  : select "v"."c1", "v"."c2" from (
              values ('Hello', 'World'), ('Cool', 'eh?')
            ) as "v"("c1", "c2")
REDSHIFT  : ... like PostgreSQL
SQLITE    : ... like H2
SQLSERVER : ... like PostgreSQL
SYBASE    : select [v].[c1], [v].[c2] from (
              (select 'Hello', 'World' from [SYS].[DUMMY]) 
               union all 
              (select 'Cool', 'eh?' from [SYS].[DUMMY])
            ) [v]([c1], [c2])
VERTICA   : ... like PostgreSQL

There are 4 flavours of how this is supported:

  • PostgreSQL and others: Support both VALUES and derived column lists
  • H2 and others: Support only VALUES, not derived column lists
  • Sybase SQL Anywhere and others: Do not support VALUES, but derived column lists
  • Oracle and others: Support neither feature

Clearly, this is only syntactic sugar for other, more verbose SQL, but it’s really cool when you don’t actually need any real table. In fact the whole optional FROM clause discussion from the beginning of this article is unnecessary, once you have VALUES(), which would be the standard way to “avoid” the FROM clause.

7. Predicates using Row Value Expressions

Once you’ve started using these, you will not want to miss them. Row value expressions. They’re essentially just tuple expressions, like:

SELECT *
FROM customer
WHERE (first_name, last_name)
    = ('MARY', 'SMITH')

Or, according to the standard and to PostgreSQL, also:

SELECT *
FROM customer
WHERE ROW (first_name, last_name)
    = ROW ('MARY', 'SMITH')

The functionality doesn’t seem very useful when using equality predicates, but it is much more interesting when using IN predicates:

-- Any customer named the same way as an actor?
SELECT *
FROM customer
WHERE (first_name, last_name) IN (
  SELECT first_name, last_name
  FROM actor
)

Or, when doing keyset pagination, through non-equality predicates:

SELECT *
FROM customer
WHERE (first_name, last_name) 
    > ('JENNIFER', 'DAVIS')

Again, not all databases support these. And those that do, have various levels of support. PostgreSQL is again the only database that goes “all in” on all the predicates, including funky things like the DISTINCT predicate:

SELECT *
FROM customer
WHERE (first_name, last_name) 
  IS DISTINCT FROM ('JENNIFER', 'DAVIS')

But luckily, again, these things can be emulated. Let’s look at all 3 of the above examples, and save ourselves the DISTINCT predicate for the next list item:

Equality on row value expressions

This is trivial. Either it’s supported, or it isn’t. Or you’re Oracle, and require a special syntax, to prevent ORA-00920 invalid relational operator (I would really love to hear that story. Must be funky):

ACCESS    : ... like SQL Server
ASE       : ... like SQL Server
CUBRID    : ... like PostgreSQL
DB2       : ... like PostgreSQL
DERBY     : ... like SQL Server
FIREBIRD  : ... like SQL Server
H2        : ... like PostgreSQL
HANA      : ... like SQL Server
HSQLDB    : ... like PostgreSQL
INFORMIX  : select * from customer 
            where row (first_name, last_name) = row ('MARY', 'SMITH')
INGRES    : ... like SQL Server
MARIADB   : ... like PostgreSQL
MYSQL     : ... like PostgreSQL
ORACLE    : select * from customer 
            where (first_name, last_name) = (('MARY', 'SMITH'))
POSTGRES  : select * from customer 
            where (first_name, last_name) = ('MARY', 'SMITH')
REDSHIFT  : ... like PostgreSQL
SQLITE    : ... like SQL Server
SQLSERVER : select * from customer 
            where (first_name = 'MARY' and last_name = 'SMITH')
SYBASE    : ... like SQL Server
VERTICA   : ... like PostgreSQL

Note that Informix requires the ROW() constructor, which should be optional. And again, Oracle is… Oracle 🙂

IN predicate

Emulating this is much more tricky if it is not supported. Remember that IN and EXISTS predicates can be equivalent, so there’s always a way to transform them into each other.

ACCESS    : ... like SQLite
ASE       : ... like SQL Server
CUBRID    : ... like SQL Server
DB2       : ... like SQL Server
DERBY     : ... like SQL Server
FIREBIRD  : ... like SQL Server
H2        : select * from customer where (first_name, last_name) in (
              select (first_name, last_name) from actor
            )
HANA      : ... like SQLite
HSQLDB    : ... like PostgreSQL
INFORMIX  : ... like SQL Server
INGRES    : ... like SQLite
MARIADB   : ... like PostgreSQL
MYSQL     : ... like PostgreSQL
ORACLE    : select * from customer where (first_name, last_name) in ((
              select first_name, last_name from actor
            ))
POSTGRES  : select * from customer where (first_name, last_name) in (
              select first_name, last_name from actor
            )
REDSHIFT  : ... like PostgreSQL
SQLITE    : select * from customer where exists (
              select x.c1, x.c2 from (
                (select null c1, null c2 where 1 = 0) 
                 union all 
                (select first_name, last_name from actor)
              ) x 
              where (first_name = x.c1 and last_name = x.c2)
            )
SQLSERVER : select * from customer where exists (
              select x.c1, x.c2 
              from (select first_name, last_name from actor) x(c1, c2) 
              where (first_name = x.c1 and last_name = x.c2)
            )
SYBASE    : ... like SQL Server
VERTICA   : ... like SQL Server

Observations:

  • At this point, it’s worth mentioning that these things work “by accident” in H2. H2 unfortunately decided to use the (a, b, …, n) syntax for arrays, which are similar to tuples, but not the same thing. You can see in the H2 syntax that we have to wrap the two columns of the subquery in parentheses as well for the IN predicate to work as expected.
  • The transformation to an EXISTS() predicate requires the derived column list feature again. This is why some emulations are even more verbose than others.

Non-equality predicate

This predicate can be expanded to its standard definition again, easily, if it is not natively supported:

ACCESS    : ... like Oracle
ASE       : ... like PostgreSQL
CUBRID    : ... like Oracle
DB2       : ... like PostgreSQL
DERBY     : ... like Oracle
FIREBIRD  : ... like Oracle
H2        : ... like PostgreSQL
HANA      : ... like Oracle
HSQLDB    : ... like PostgreSQL
INFORMIX  : ... like Oracle
INGRES    : ... like Oracle
MARIADB   : ... like PostgreSQL
MYSQL     : ... like PostgreSQL
ORACLE    : select * from customer where (
              first_name >= 'JENNIFER' and (
                first_name > 'JENNIFER' or (
                  first_name = 'JENNIFER' and last_name > 'DAVIS'
                )
              )
            )
POSTGRES  : select * from customer 
              where (first_name, last_name) > ('JENNIFER', 'DAVIS')
REDSHIFT  : ... like Oracle
SQLITE    : ... like Oracle
SQLSERVER : ... like Oracle
SYBASE    : ... like Oracle
VERTICA   : ... like PostgreSQL

Observation:

  • Strictly speaking, it is not necessary to have one of the two predicates redundant in the emulation, but unfortunately, many databases have trouble when the top boolean operator of a boolean expression is OR rather than AND

8. The DISTINCT predicate

In the previous section, we’ve briefly mentioned the DISTINCT predicate, a useful predicate that helps handling NULL values as we’s mostly expect.

A quick summary in PostgreSQL:

WITH t(v) AS (
  VALUES (1),(2),(null)
)
SELECT v1, v2, v1 IS DISTINCT FROM v2
FROM t t1(v1), t t2(v2)

This yields:

v1  v2  d
-----------------
1   1   false    
1   2   true     
1       true     
2   1   true     
2   2   false    
2       true     
    1   true     
    2   true     
        false    

Conveniently, this never returns NULL when comparing anything with NULL, so simply spoken NULL IS NOT DISTINCT FROM NULL is TRUE. Quite some syntax, but hey, it’s SQL.

Regrettably, only few databases support the standard syntax, and MySQL and SQLite have a much more concise, non-standard operator. Let’s emulate the query from the previous section in our databases:

SELECT *
FROM customer
WHERE (first_name, last_name) 
  IS DISTINCT FROM ('JENNIFER', 'DAVIS')

Observe, there’s a really cool way to emulate this operation using INTERSECT, because interestingly, set operations also treat two NULL values as “the same”, i.e. non-DISTINCT. We have:

ACCESS    : ... like SQL Server (plus, add the dual table)
ASE       : ... like SQL Server (plus, add the dual table)
CUBRID    : ... like SQL Server (plus, add the dual table)
DB2       : ... like SQL Server (plus, add the dual table)
DERBY     : ... like SQL Server (plus, add the dual table)
FIREBIRD  : ... like PostgreSQL
H2        : ... like PostgreSQL
HANA      : ... like SQL Server (plus, add the dual table)
HSQLDB    : ... like PostgreSQL
INFORMIX  : ... like SQL Server (plus, add the dual table)
INGRES    : ... like SQL Server (plus, add the dual table)
MARIADB   : ... like MySQL
MYSQL     : select * from customer where (not((first_name, last_name) 
                                         <=> ('JENNIFER', 'DAVIS')))
ORACLE    : ... like SQL Server (plus, add the dual table)
POSTGRES  : select * from customer where (first_name, last_name) 
                        is distinct from ('JENNIFER', 'DAVIS')
REDSHIFT  : ... like PostgreSQL
SQLITE    : select * from customer where ((first_name, last_name) 
                                   is not ('JENNIFER', 'DAVIS'))
SQLSERVER : select * from customer where not exists (
              (select first_name, last_name) 
               intersect 
              (select 'JENNIFER', 'DAVIS')
            )
SYBASE    : ... like SQL Server (plus, add the dual table)
VERTICA   : ... like SQL Server

Want to try it yourself? The original PostgreSQL truth-table producing query can be transformed to this one:

WITH t(v) AS (
  VALUES (1),(2),(null)
)
SELECT v1, v2, NOT EXISTS (
  SELECT v1 INTERSECT SELECT v2
)
FROM t t1(v1), t t2(v2)

It produces the same truth table. Cool, eh?

9. DDL statements

This is one of the main reasons why we’re doing all of this. We want to allow for SQL text based migration scripts (e.g. run with Flyway) to be translatable to any kind of SQL dialect. Because DDL is really the most boring part of SQL to keep vendor-agnostic.

Just two short examples:

Copying a table structure into a new table

A quick and dirty way to copy a table structure is this:

CREATE TABLE x AS 
SELECT 1 AS one
WITH NO DATA

Looks cool, hm? Unfortunately, there is some trouble with the syntax as you will see in the emulations:

DB2       : create table x as (select 1 one from "SYSIBM"."DUAL") 
            with no data
H2        : ... like Oracle
MARIADB   : ... like Oracle
MYSQL     : ... like Oracle
ORACLE    : create table x as select 1 one from dual where 1 = 0
POSTGRES  : create table x as select 1 as one with no data
SQLSERVER : select 1 one into x where 1 = 0

I’ve left out a couple of dialects, as this hasn’t been integration tested everywhere yet, being work in progress. There are 4 flavours:

  • PostgreSQL: Actual support for the WITH [ NO ] DATA clause
  • DB2: Actual support for the WITH [ NO ] DATA clause (but funky requirement to wrap the source query in parentheses
  • Oracle: No support for the clause (easy to emulate with dummy predicate), but at least support for CTAS (CREATE TABLE AS SELECT)
  • SQL Server: Vendor specific alternative to CTAS

The inverse is equally fun to emulate, let’s actually add the data:

CREATE TABLE x AS 
SELECT 1 AS one
WITH DATA

And we’re getting:

DB2       : begin 
              execute immediate '
                create table x as (select 1 one from "SYSIBM"."DUAL") 
                with no data 
              '; 
              execute immediate '
                insert into x select 1 one from "SYSIBM"."DUAL" 
              '; 
            end
ORACLE    : create table x as select 1 one from dual
POSTGRES  : create table x as select 1 as one with data
SQLSERVER : select 1 one into x

Let’s focus on the interesting bits only.

  • Oracle, PostgreSQL, SQL Server as before
  • DB2 actually cannot copy the data with the table structure. Huh!

As can be seen above, in cases like this, we might need to split a single DDL statement in a statement batch or anonymous block containing several statements. This is work in progress as not all databases support anonymous blocks or statement batches.

There are many other interesting DDL emulations, and a lot of it is not yet implemented. Just play around with them on https://www.jooq.org/translate and send us your feature requests / ideas to https://github.com/jOOQ/jOOQ/issues/new

10. Built-in Functions

Last but not least, there are a ton of built-in functions, such as the boring LPAD() function. (Left pad is known for various things). Migrating these between databases is really really tedious. We’re here to help! Let’s emulate:

SELECT lpad('abc', ' ', 5)

Translation:

ACCESS    : replace(space(' ' - len('abc')), ' ', 5) & 'abc'
ASE       : (replicate(5, (' ' - char_length('abc'))) || 'abc')
CUBRID    : lpad('abc', ' ', 5)
DB2       : lpad('abc', ' ', 5)
DERBY     : lpad('abc', ' ', 5)
FIREBIRD  : lpad('abc', ' ', 5)
H2        : lpad('abc', ' ', 5)
HANA      : lpad('abc', ' ', 5)
HSQLDB    : lpad('abc', ' ', 5)
INFORMIX  : lpad('abc', ' ', 5)
INGRES    : lpad('abc', ' ', 5)
MARIADB   : lpad('abc', ' ', 5)
MYSQL     : lpad('abc', ' ', 5)
ORACLE    : lpad('abc', ' ', 5)
POSTGRES  : lpad('abc', ' ', 5)
REDSHIFT  : lpad('abc', ' ', 5)
SQLITE    : substr(replace(replace(substr(quote(zeroblob(((' ' - length('abc') - 1 + length("5")) / length("5") + 1) / 2)), 3), '''', ''), '0', "5"), 1, (' ' - length('abc'))) || 'abc'
SQLSERVER : (replicate(5, (' ' - len('abc'))) + 'abc')
SYBASE    : (repeat(5, (' ' - length('abc'))) || 'abc')
VERTICA   : lpad('abc', ' ', 5)

Forgive me again for the SQLite version. It was a suggestion made by an unknown user on Stack Overflow, the place where I tend to nerd-snipe people into solving such problems for me for free.

Conclusion

jOOQ standardises SQL into a type safe, embedded internal DSL in Java. With jOOQ 3.9+, we’ve added a parser (which is also publicly available on https://www.jooq.org/translate), which removes the need to express everything in the jOOQ API. Just parse your random SQL string and translate it to some other SQL dialect. This list could easily be extended to 50 items and more, but it is much more fun to play around with our website and try this on your own.

Please, if you do, do report any issue, feature request that you’d like to see at: https://github.com/jOOQ/jOOQ/issues/new to help us make this new tool even more valuable for you. In the near future, we’re going to more closely integrate this parser with other tools, such as Flyway, as we think there’s a lot of value in vendor-agnostic, standardised SQL.

Map Reducing a Set of Values Into a Dynamic SQL UNION Query

Sounds fancy, right? But it’s a really nice and reasonable approach to doing dynamic SQL with jOOQ.

This blog post is inspired by a Stack Overflow question, where a user wanted to turn a set of values into a dynamic UNION query like this:

SELECT T.COL1
FROM T
WHERE T.COL2 = 'V1'
UNION
SELECT T.COL1
FROM T
WHERE T.COL2 = 'V2'
...
UNION
SELECT T.COL1
FROM T
WHERE T.COL2 = 'VN'

Note, both the Stack Overflow user and I are well aware of the possibility of using IN predicates :-), let’s just assume for the sake of argument, that the UNION query indeed outperforms the IN predicate in the user’s particular MySQL version and database. If this cannot be accepted, just imagine a more complex use case.

The solution in Java is really very simple:

import static org.jooq.impl.DSL.*;
import java.util.*;
import org.jooq.*;

public class Unions {
    public static void main(String[] args) {
        List<String> list = Arrays.asList("V1", "V2", "V3", "V4");

        System.out.println(
            list.stream()
                .map(Unions::query)
                .reduce(Select::union));
    }

    // Dynamically construct a query from an input string
    private static Select<Record1<String>> query(String s) {
        return select(T.COL1).from(T).where(T.COL2.eq(s));
    }
}

The output is:

Optional[(
  select T.COL1
  from T
  where T.COL2 = 'V1'
)
union (
  select T.COL1
  from T
  where T.COL2 = 'V2'
)
union (
  select T.COL1
  from T
  where T.COL2 = 'V3'
)
union (
  select T.COL1
  from T
  where T.COL2 = 'V4'
)]

If you’re using JDK 9+ (which has Optional.stream()), you can further proceed to running the query fluently as follows:

List<String> list = Arrays.asList("V1", "V2", "V3", "V4");

try (Stream<Record1<String>> stream = list.stream()
    .map(Unions::query)
    .reduce(Select::union))
    .stream() // Optional.stream()!
    .flatMap(Select::fetchStream)) {
    ...
}

This way, if the list is empty, reduce will return an empty optional. Streaming that empty optional will result in not fetching any results from the database.

How to Order Versioned File Names Semantically in Java

In most cases, natural sorting by sorting lexicographically is useful as a default in Java. This includes sorting file names, which are sorted lexicographically as well.

However, when we have version numbers in our files (such as a set of SQL migration scripts), then we prefer the files to be sorted in a more intuitive ordering, where the version numbers contained in the string become “semantic”. In the following example, we have a set of versions, once sorted “naturally”, and once “semantically”:

Natural sorting

  • version-1
  • version-10
  • version-10.1
  • version-2
  • version-21

Semantic sorting

  • version-1
  • version-2
  • version-10
  • version-10.1
  • version-21

Semantic ordering, Windows style

The Windows Explorer does this as well, although there’s a slight difference as the “.” character is used to separate filename from ending, so now, we’re comparing a version sub-number (1) with a file ending (sql)…

The JDK doesn’t seem to have a built-in Comparator that implements this ordering, but we can easily roll our own. The idea is simple. We want to split a file name into several chunks, where a chunk is either a string (sorted lexicographically), or an integer number (sorted numerically). We split that file name using a regular expression:

Pattern.compile("(?<=\\D)(?=\\d)|(?<=\\d)(?=\\D)");

This expression matches the boundary between string and number, without actually capturing anything, so we can use it for split() operations. The idea was inspired by this stack exchange answer. Here’s the logic of the comparator annotated with comments:

public final class FilenameComparator
implements Comparator<String> {

    private static final Pattern NUMBERS = 
        Pattern.compile("(?<=\\D)(?=\\d)|(?<=\\d)(?=\\D)");

    @Override
    public final int compare(String o1, String o2) {

        // Optional "NULLS LAST" semantics:
        if (o1 == null || o2 == null)
            return o1 == null ? o2 == null ? 0 : -1 : 1;

        // Splitting both input strings by the above patterns
        String[] split1 = NUMBERS.split(o1);
        String[] split2 = NUMBERS.split(o2);
        int length = Math.min(split1.length, split2.length);

        // Looping over the individual segments
        for (int i = 0; i < length; i++) {
            char c1 = split1[i].charAt(0);
            char c2 = split2[i].charAt(0);
            int cmp = 0;

            // If both segments start with a digit, sort them
            // numerically using BigInteger to stay safe
            if (c1 >= '0' && c1 <= '9' && c2 >= 0 && c2 <= '9')
                cmp = new BigInteger(split1[i]).compareTo(
                      new BigInteger(split2[i]));

            // If we haven't sorted numerically before, or if
            // numeric sorting yielded equality (e.g 007 and 7)
            // then sort lexicographically
            if (cmp == 0)
                cmp = split1[i].compareTo(split2[i]);

            // Abort once some prefix has unequal ordering
            if (cmp != 0)
                return cmp;
        }

        // If we reach this, then both strings have equally
        // ordered prefixes, but maybe one string is longer than
        // the other (i.e. has more segments)
        return split1.length - split2.length;
    }
}

That’s it. Here’s an example on how to use this:

// Random order
List<String> list = asList(
    "version-10", 
    "version-2", 
    "version-21", 
    "version-1", 
    "version-10.1"
);

// Turn versions into files
List<File> l2 = list
    .stream()
    .map(s -> "C:\\temp\\" + s + ".sql")
    .map(File::new)
    .collect(Collectors.toList());

System.out.println("Natural sorting");
l2.stream()
  .sorted()
  .forEach(System.out::println);

System.out.println();
System.out.println("Semantic sorting");
l2.stream()
  .sorted(Comparator.comparing(
      File::getName, 
      new FilenameComparator()))
  .forEach(System.out::println);

The output is:

Natural sorting
C:\temp\version-1.sql
C:\temp\version-10.1.sql
C:\temp\version-10.sql
C:\temp\version-2.sql
C:\temp\version-21.sql

Semantic sorting
C:\temp\version-1.sql
C:\temp\version-2.sql
C:\temp\version-10.1.sql
C:\temp\version-10.sql
C:\temp\version-21.sql

Again, the algorithm is rather simple as it doesn’t distinguish between file endings and “segments”, so (1) is compared with (sql), which might not be the desired behaviour. This can be easily fixed by recognising actual file endings and excluding them from the comparison logic – at the price of not being able to sort files without file endings… The comparator would then look like this:

public final class FilenameComparator
implements Comparator<String> {

    private static final Pattern NUMBERS = 
        Pattern.compile("(?<=\\D)(?=\\d)|(?<=\\d)(?=\\D)");
    private static final Pattern FILE_ENDING =
        Pattern.compile("(?<=.*)(?=\\..*)");

    @Override
    public final int compare(String o1, String o2) {
        if (o1 == null || o2 == null)
            return o1 == null ? o2 == null ? 0 : -1 : 1;

        String[] name1 = FILE_ENDING.split(o1);
        String[] name2 = FILE_ENDING.split(o2);

        String[] split1 = NUMBERS.split(name1[0]);
        String[] split2 = NUMBERS.split(name2[0]);
        int length = Math.min(split1.length, split2.length);

        // Looping over the individual segments
        for (int i = 0; i < length; i++) {
            char c1 = split1[i].charAt(0);
            char c2 = split2[i].charAt(0);
            int cmp = 0;

            if (c1 >= '0' && c1 <= '9' && c2 >= 0 && c2 <= '9')
                cmp = new BigInteger(split1[i]).compareTo(
                      new BigInteger(split2[i]));

            if (cmp == 0)
                cmp = split1[i].compareTo(split2[i]);

            if (cmp != 0)
                return cmp;
        }

        int cmp = split1.length - split2.length;
        if (cmp != 0)
            return cmp;

        cmp = name1.length - name2.length;
        if (cmp != 0)
            return cmp;

        return name1[1].compareTo(name2[1]);
    }
}

The output is now:

C:\temp\version-1.sql
C:\temp\version-2.sql
C:\temp\version-10.sql
C:\temp\version-10.1.sql
C:\temp\version-21.sql

Discussion about a JDK implementation

Tagir Valeev from JetBrains was so kind to point out discussions about adding such an implementation to the JDK:

The discussion is here:

Clearly, the suggested implementation on the JDK mailing list is superior to the one from this blog post, as it:

  • Correctly handles unicode
  • Works with individual codepoint based comparisons rather than regular expressions, which has a lower memory footprint. This can be significant for sorting large lists, as sorting has O(N log N) complexity

Type Safe Implicit JOIN Through Path Navigation in jOOQ 3.11

One of the biggest contributors to SQL syntax verbosity is the need to explicitly JOIN every table that somehow contributes to the query, even if that contribution is “trivial”. When looking at the Sakila database, an example could be seen easily when fetching customer data:

SELECT 
  cu.first_name,
  cu.last_name,
  co.country
FROM customer AS cu
JOIN address USING (address_id)
JOIN city USING (city_id)
JOIN country AS co USING (country_id)  

That single access to the country information cost us 3 additional lines of SQL code as well as the cognitive overhead of mentally navigating through the to-one relationships in order to get the joins right.

This can be equally tedious when writing the SQL as well as when reading it! There is separation of concerns (projection vs joins) where there shouldn’t be in this particular case. We’re just projecting the country, not doing anything with it, let alone care about the individual table / primary key / foreign key names. Imagine if we had composite keys in the path from customer to country…

Implicit JOIN from SELECT clause

Wouldn’t it be much better (in this case) to be able to write:

SELECT 
  cu.first_name,
  cu.last_name,
  cu.address.city.country.country
FROM customer AS cu

Because after all, that’s really the same thing. We’re fetching only customers, and we load some additional content from its parent table(s). Since we’re navigating to-one relationships only (as opposed to navigating to-many relationships), we don’t really need actual JOIN semantics, a JOIN being a filtered cartesian product.

Implicit JOIN from WHERE clause

The same is true when fetching customers from a particular country. Why write:

SELECT 
  cu.first_name,
  cu.last_name
FROM customer AS cu
JOIN address USING (address_id)
JOIN city USING (city_id)
JOIN country AS co USING (country_id)
WHERE co.country = 'Switzerland'

When this would be a lot more natural:

SELECT 
  cu.first_name,
  cu.last_name
FROM customer AS cu
WHERE cu.address.city.country.country = 'Switzerland'

Implicit JOIN from multiple clauses

Another example would be when grouping by country to find out how many customers per country we have. Standard SQL, explicit JOIN version:

SELECT 
  co.country,
  COUNT(*),
  COUNT(DISTINCT city.city)
FROM customer AS cu
JOIN address USING (address_id)
JOIN city USING (city_id)
JOIN country AS co USING (country_id)  
GROUP BY co.country
ORDER BY co.country

Again, the many JOINs could be seen as noise, when the implicit version may seem much leaner:

SELECT 
  cu.address.city.country.country,
  COUNT(*),
  COUNT(DISTINCT cu.address.city.city)
FROM customer AS cu
GROUP BY cu.address.city.country.country
ORDER BY cu.address.city.country.country

Even if the same expression is repeated 3x (and we could easily alias it, of course), the output query would still do only that single JOIN graph that we’ve seen before. In fact, there are two different paths:

  • cu.address.city.*
  • cu.address.city.country.*

Internally, we should recognise that the paths are part of the same tree traversal, so the JOIN graph produced by cu.address.city.* can be re-used for cu.address.city.country.*

In fact, we could actually add one (semi-)explicit JOIN to avoid the repetition:

SELECT 
  ci.country.country,
  COUNT(*),
  COUNT(DISTINCT ci.city)
FROM customer AS cu
IMPLICIT JOIN cu.address.city AS ci
GROUP BY ci.country.country
ORDER BY ci.country.country

Implicit JOIN from correlated subqueries

A more sophisticated case would be an implicit join in a correlated subquery, which should really affect the outer query rather than the subquery. Consider finding all customers and the number of customers from the same country:

SELECT 
  cu.first_name,
  cu.last_name, 
  (
    SELECT COUNT(*)
    FROM customer AS cu2
    JOIN address USING (address_id)
    JOIN city AS ci2 USING (city_id)
    WHERE ci2.country_id = ci.country_id
  ) AS customers_from_same_country
FROM customer AS cu
JOIN address USING (address_id)
JOIN city AS ci USING (city_id)

Now clearly, the JOINs start getting into the way of readability (and writeability as well). There’s a slight risk of getting semantics wrong because of all the aliasing going on. A much leaner solution is:

SELECT 
  cu.first_name,
  cu.last_name, 
  (
    SELECT COUNT(*)
    FROM customer AS cu2
    WHERE cu2.address.city.country_id =
          cu.address.city.country_id
  ) AS customers_from_same_country
FROM customer AS cu

Now, of course, many of you cringed and were ready to point out that a correlated subquery isn’t the best solution in this case, and you’re absolutely correct. Use window functions, instead!

Implicit JOIN from window functions

Still, you can profit from implicit JOIN again. Plain SQL version:

SELECT 
  cu.first_name,
  cu.last_name, 
  COUNT(*) OVER (PARTITION BY ci.country_id)
    AS customers_from_same_country
FROM customer AS cu
JOIN address USING (address_id)
JOIN city AS ci USING (city_id)

Implicit JOIN version:

SELECT 
  cu.first_name,
  cu.last_name, 
  COUNT(*) OVER (PARTITION BY cu.address.city.country_id)
    AS customers_from_same_country
FROM customer AS cu

It doesn’t matter where the implicit JOIN appears, i.e. where the path-based parent table access appears. The translation from implicit JOIN syntax to explicit JOIN will always append a JOIN or several JOINs to the left-most child table in the JOIN path, wherever that table is declared. This is a simple matter of scope resolution.

Drawbacks

Technically, there are no drawbacks of the implicit JOIN syntax for to-one relationships compared to the explicit JOIN syntax. But of course, as always with syntax sugar, there’s a slight risk of a developer not fully aware of how things work behind the scenes choosing a less optimal (but visually more elegant) solution over a more performant one.

This could be the case when modelling ANTI JOINs as implicit JOINs with a IS NULL predicate. In some databases, that might still be better, but in most databases, using NOT EXISTS() should be preferred when ANTI JOIN semantics is implemented.

Implicit JOIN for to-many relationship

Having a syntax for navigating to-many relationships is desireable as well, although the implications on semantics are vastly different. While implicit JOINs on to-one relationships have no unexpected effects on the semantics of the query, implicit JOINs on to-many relationships implicitly change the cardinalities of queries they’re contained in. For example:

SELECT
  a.first_name,
  a.last_name,
  a.film.title
FROM actor AS a

When navigating from the ACTOR to the FILM table (via the FILM_ACTOR relationship table), we’re going to duplicate the actor results. It is rather unexpected to have an expression in the SELECT clause to modify the cardinalities of a query, and thus, probably not a good idea. Specifically, there are many cases of implicit JOINs on to-many relationships where the semantics is unclear, ambiguous, or even wrong, because of this change of cardinalities.

For the sake of simplicity, this discussion is out of scope for this article, and for the upcoming jOOQ feature:

jOOQ support for implicit JOIN

Some ORMs like Hibernate, Doctrine, and others have implemented this feature in the past in their own respective query languages, such as HQL, DQL. jOOQ 3.11 follows suit and offers this feature as well through its type safe SQL query API (see https://github.com/jOOQ/jOOQ/issues/1502)

This will be done for the entirety of the SQL language, not just a limited subset, such as HQL or DQL.

All of the above queries can be written in jOOQ as such:

Customer cu = CUSTOMER.as("cu");

ctx.select(
      cu.FIRST_NAME,
      cu.LAST_NAME,
      cu.address().city().country().COUNTRY)
   .from(cu)
   .fetch();

ctx.select(
      cu.FIRST_NAME,
      cu.LAST_NAME)
   .from(cu)
   .where(cu.address().city().country().COUNTRY.eq("Switzerland"))
   .fetch();

ctx.select(cu.address().city().country().COUNTRY, count())
   .from(cu)
   .groupBy(cu.address().city().country().COUNTRY)
   .orderBy(cu.address().city().country().COUNTRY)
   .fetch();

Customer cu2 = CUSTOMER.as("cu2");

ctx.select(
      cu.FIRST_NAME,
      cu.LAST_NAME,
      field(selectCount()
          .from(cu2)
          .where(cu2.address().city().COUNTRY_ID.eq(
                 cu.address().city().COUNTRY_ID))
      ).as("customers_from_same_country"))
   .from(cu)
   .fetch();

ctx.select(
      cu.FIRST_NAME,
      cu.LAST_NAME,
      count().over(partitionBy(cu.address().city().COUNTRY_ID))
        .as("customers_from_same_country"))
   .from(cu)
   .fetch();

The navigation is completely type safe thanks to jOOQ’s code generator which generates navigational methods from child table to parent table in the presence of foreign keys. By default, the method name matches the parent table name (single foreign key between child and parent) or the foreign key constraint name (multiple foreign keys between child and parent), but as always, this can be overridden easily using generator strategies.

The feature is really extremely powerful. For a much more complex example, see:

Bringing implicit JOIN to actual SQL

A nice jOOQ feature that hasn’t been advertised too often yet is the new jOOQ parser, whose main purpose (so far) is to offer support for the DDLDatabase, a tool that reverse engineers your DDL scripts to generate jOOQ code. The parser will have many other uses in the future, though, including its capability of being exposed behind a JDBC proxy API, which can parse any JDBC based application’s SQL and re-generate it using different settings (e.g. a different dialect).

Of course, the parser (if supplied with schema meta information, see https://github.com/jOOQ/jOOQ/issues/5296) will be able to resolve such path expressions and transform the input SQL string using implicit JOINs to the equivalent output SQL string with natural SQL joins.

This topic is still under research. More information will follow as the scope of this functionality will become more clear.

Availability in jOOQ

jOOQ 3.11 is due for late Q3 2018 / early Q4 2018. You can already play around with this feature by checking out jOOQ from GitHub:
https://github.com/jOOQ/jOOQ

Your feedback is very welcome!

How to Implement Your Own XJC Plugin to Generate toString(), equals(), and hashCode() Methods

When you work with JAXB to generate Java code from XSD (or *ghasp* WSDL) files, you’re going to use the XJC tool, which is shipped with the JDK (at least until JDK 9 – with Jigsaw, it will soon be externalised into its own external dependency).

Adding plugins to XJC when running it via Maven is fairly straightforward. There are a few useful plugins available for free from here: https://github.com/highsource/jaxb2-basics and from other sources. But if you’re not entirely happy with the results, you might need to roll your own.

In this article, we’ll look into how we can write a simple plugin to generate custom renditions of these methods:

  • toString()
  • equals()
  • hashCode()

Set up a project

First, we need to set up a new Maven project, which contains the plugin code. This is quite straightforward. Just work with a single dependency and you’re done:

<project xmlns="http://maven.apache.org/POM/4.0.0" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">

    <modelVersion>4.0.0</modelVersion>

    <groupId>org.jooq</groupId>
    <artifactId>jooq-tools-xjc-plugin</artifactId>
    <version>3.11.0-SNAPSHOT</version>
    <name>jOOQ XJC Code Generation Plugin</name>
    
    <dependencies>      
        <dependency>
            <groupId>com.sun.xml.bind</groupId>
            <artifactId>jaxb-xjc</artifactId>
            <version>2.3.0</version>
        </dependency>
    </dependencies>
</project>

Now, add the plugin logic

An empty plugin essentially looks like this:

package org.jooq.xjc;

import org.xml.sax.ErrorHandler;

import com.sun.tools.xjc.Options;
import com.sun.tools.xjc.Plugin;
import com.sun.tools.xjc.outline.ClassOutline;
import com.sun.tools.xjc.outline.Outline;

/**
 * @author Lukas Eder
 */
public class XJCPlugin extends Plugin {

    @Override
    public String getOptionName() {
        return "Xjooq-equals-hashcode-tostring";
    }

    @Override
    public int parseArgument(Options opt, String[] args, int i) {
        return 1;
    }

    @Override
    public String getUsage() {
        return "  -Xjooq-equals-hashcode-tostring    :  xjc plugin";
    }

    @Override
    public boolean run(Outline model, Options opt, ErrorHandler errorHandler) {
        return true;
    }
}

The important parts in our case are the getOptionName() method, which provides a flag that can be used from XJC code generation configuration, to activate our plugin, and the run() method, which will contain our code, adding the three desired methods.

Let’s fill in some actual code!

package org.jooq.xjc;

import static com.sun.codemodel.JMod.FINAL;
import static com.sun.codemodel.JMod.PUBLIC;
import static com.sun.codemodel.JMod.STATIC;

import java.util.Map.Entry;

import org.xml.sax.ErrorHandler;

import com.sun.codemodel.JBlock;
import com.sun.codemodel.JClass;
import com.sun.codemodel.JCodeModel;
import com.sun.codemodel.JConditional;
import com.sun.codemodel.JExpr;
import com.sun.codemodel.JFieldVar;
import com.sun.codemodel.JMethod;
import com.sun.codemodel.JOp;
import com.sun.codemodel.JVar;
import com.sun.tools.xjc.Options;
import com.sun.tools.xjc.Plugin;
import com.sun.tools.xjc.outline.ClassOutline;
import com.sun.tools.xjc.outline.Outline;

/**
 * @author Lukas Eder
 */
public class XJCPlugin extends Plugin {

    @Override
    public String getOptionName() {
        return "Xjooq-equals-hashcode-tostring";
    }

    @Override
    public int parseArgument(Options opt, String[] args, int i) {
        return 1;
    }

    @Override
    public String getUsage() {
        return "  -Xjooq-equals-hashcode-tostring    :  xjc example plugin";
    }

    @Override
    public boolean run(Outline model, Options opt, ErrorHandler errorHandler) {
        JCodeModel m = new JCodeModel();

        for (ClassOutline o : model.getClasses()) {

            // toString()
            // ---------------------------------------------------------------------------
            {
                JMethod method = o.implClass.method(PUBLIC, String.class, "toString");
                method.annotate(Override.class);
                JBlock body = method.body();
                JClass sbType = m.ref(StringBuilder.class);
                JVar sb = body.decl(0, sbType, "sb", JExpr._new(sbType));

                for (Entry<String, JFieldVar> e : o.implClass.fields().entrySet()) {
                    JFieldVar v = e.getValue();

                    if ((v.mods().getValue() & STATIC) == 0) {
                        body.invoke(sb, "append").arg("<" + e.getKey() + ">");
                        body.invoke(sb, "append").arg(v);
                        body.invoke(sb, "append").arg("</" + e.getKey() + ">");
                    }
                }

                body._return(JExpr.invoke(sb, "toString"));
            }

            // equals()
            // ---------------------------------------------------------------------------
            {
                JMethod method = o.implClass.method(PUBLIC, boolean.class, "equals");
                method.annotate(Override.class);
                JVar that = method.param(Object.class, "that");
                JBlock body = method.body();
                body._if(JExpr._this().eq(that))
                    ._then()._return(JExpr.lit(true));
                body._if(that.eq(JExpr._null()))
                    ._then()._return(JExpr.lit(false));
                body._if(JExpr.invoke("getClass").ne(JExpr.invoke(that, "getClass")))
                    ._then()._return(JExpr.lit(false));

                JVar other = body.decl(0, o.implClass, "other", JExpr.cast(o.implClass, that));

                for (Entry<String, JFieldVar> e : o.implClass.fields().entrySet()) {
                    JFieldVar v = e.getValue();

                    if ((v.mods().getValue() & STATIC) == 0) {
                        if (v.type().isPrimitive()) {
                            body._if(v.ne(other.ref(v)))
                                ._then()._return(JExpr.lit(false));
                        }
                        else {
                            JConditional i = body._if(v.eq(JExpr._null()));
                            i._then()._if(other.ref(v).ne(JExpr._null()))
                                     ._then()._return(JExpr.lit(false));
                            i._elseif(v.invoke("equals").arg(other.ref(v)).not())
                             ._then()._return(JExpr.lit(false));
                        }
                    }
                }

                body._return(JExpr.lit(true));
            }

            // hashCode()
            {
                JMethod method = o.implClass.method(PUBLIC, int.class, "hashCode");
                method.annotate(Override.class);
                JBlock body = method.body();
                JVar prime = body.decl(FINAL, m.INT, "prime", JExpr.lit(31));
                JVar result = body.decl(0, m.INT, "result", JExpr.lit(1));

                for (Entry<String, JFieldVar> e : o.implClass.fields().entrySet()) {
                    JFieldVar v = e.getValue();

                    if ((v.mods().getValue() & STATIC) == 0) {
                        body.assign(result, prime.mul(result).plus(
                            v.type().isPrimitive()
                          ? v
                          : JOp.cond(v.eq(JExpr._null()), JExpr.lit(0), v.invoke("hashCode"))
                        ));
                    }
                }

                body._return(result);
            }
        }

        return true;
    }
}

The above logic generates an XML document fragment of the JAXB-annotated classes (without formatting) and an equals() and hashCode() implementation that is inspired by the generated code you would obtain from an IDE like Eclipse.

Some example output:

@XmlAccessorType(XmlAccessType.FIELD)
@XmlType(name = "MappedTable", propOrder = {

})
public class MappedTable {

    protected String input;
    @XmlElement(type = String.class)
    protected String inputExpression;
    @XmlElement(required = true)
    protected String output;

    @Override
    public String toString() {
        StringBuilder sb = new StringBuilder();
        sb.append("<input>");
        sb.append(input);
        sb.append("</input>");
        sb.append("<inputExpression>");
        sb.append(inputExpression);
        sb.append("</inputExpression>");
        sb.append("<output>");
        sb.append(output);
        sb.append("</output>");
        return sb.toString();
    }

    @Override
    public boolean equals(Object that) {
        if (this == that) {
            return true;
        }
        if (that == null) {
            return false;
        }
        if (getClass()!= that.getClass()) {
            return false;
        }
        MappedTable other = ((MappedTable) that);
        if (input == null) {
            if (other.input!= null) {
                return false;
            }
        } else {
            if (!input.equals(other.input)) {
                return false;
            }
        }
        if (inputExpression == null) {
            if (other.inputExpression!= null) {
                return false;
            }
        } else {
            if (!inputExpression.equals(other.inputExpression)) {
                return false;
            }
        }
        if (output == null) {
            if (other.output!= null) {
                return false;
            }
        } else {
            if (!output.equals(other.output)) {
                return false;
            }
        }
        return true;
    }

    @Override
    public int hashCode() {
        final int prime = 31;
        int result = 1;
        result = ((prime*result)+((input == null)? 0 :input.hashCode()));
        result = ((prime*result)+((inputExpression == null)? 0 :inputExpression.hashCode()));
        result = ((prime*result)+((output == null)? 0 :output.hashCode()));
        return result;
    }

}

Don’t forget to register your plugin

The last step prior to building your plugin is to create a file in your project at:

src/main/resources/META-INF/services/com.sun.tools.xjc.Plugin

And put the qualified name of your plugin in it:

org.jooq.xjc.XJCPlugin

Done. Now install your plugin…

mvn clean install

… and use it from your code generation configuration as follows:

<plugin>
    <groupId>org.jvnet.jaxb2.maven2</groupId>
    <artifactId>maven-jaxb2-plugin</artifactId>
    <version>0.13.1</version>
    <executions>
        <execution>
            <id>codegen</id>
            <goals>
                <goal>generate</goal>
            </goals>
            <configuration>

                <!-- The usual configuration -->
                <encoding>UTF-8</encoding>
                <locale>us</locale>
                <forceRegenerate>true</forceRegenerate>
                <extension>true</extension>
                <strict>false</strict>
                <schemaDirectory>../jOOQ-meta/src/main/resources/xsd</schemaDirectory>
                <bindingDirectory>../jOOQ-meta/src/main/resources/xjb/codegen</bindingDirectory>
                <generateDirectory>../jOOQ-meta/src/main/java</generateDirectory>
                <generatePackage>org.jooq.util.jaxb</generatePackage>
                <schemaIncludes>
                    <include>jooq-codegen-3.11.0.xsd</include>
                </schemaIncludes>

                <!-- activate it with this line. Must match getOptionName() -->
                <args>
                    <arg>-Xjooq-equals-hashcode-tostring</arg>
                </args>
                <plugins>

                    <!-- include it with these lines. -->
                    <plugin>
                        <groupId>org.jooq.trial</groupId>
                        <artifactId>jooq-tools-xjc-plugin</artifactId>
                        <version>3.11.0-SNAPSHOT</version>
                    </plugin>
                </plugins>
            </configuration>
        </execution>
    </executions>
</plugin>

Done!