How to Patch Your IDE to Fix an Urgent Bug

Clock’s ticking. JDK 11 will remove a bunch of deprecated modules through JEP 320, which includes the Java EE modules, which again includes JAXB, a dependency of many libraries, including jOOQ. Thus far, few people have upgraded to Java 9 or 10, as these aren’t LTS releases. Unlike in the old days, however, people will be forced much earlier to upgrade to Java 11, because Java 8 (the free version) will reach end of life soon after Java 11 is released:

End of Public Updates for Oracle JDK 8
As outlined in the Oracle JDK Support Roadmap below, Oracle will not post further updates of Java SE 8 to its public download sites for commercial use after January 2019

So, we library developers must act and finally modularise our libraries. Which is, quite frankly, a pain. Not because of the module system itself, which works surprisingly well. But because of the toolchain, which is far from being production ready. This mostly includes:

It’s still almost not possible to maintain a modularised project in an IDE (I’ve tried Eclipse and IntelliJ, not Netbeans so far) as there are still tons of bugs. Some of which are showstoppers, halting compilation in the IDE (despite compilation working in Maven). For example:

But rather than just complaining, let’s complain and fix it

Let’s fix our own IDE by patching it

Disclaimer: The following procedure assumes that you have the right to modify your IDE’s source and binaries. To my understanding, this is the case with the EPL licensed Eclipse. It may not be the case for other IDEs.

Disclaimer2: Note, as reddit user fubarbazqux so eloquently put it, there are cleaner ways to apply patches (and contribute them) to the Eclipse community, if you have more time. This article just displays a very easy way to do things without spending too much time to figure out how the Eclipse development processes work, internally. It shows a QUICK FIX recipe

The first bug was already discovered and fixed for Eclipse 4.8, but its RC4 version seems to have tons of other problems, so let’s not upgrade to that yet. Instead, let’s apply the fix that can be seen here to our own distribution:

https://github.com/eclipse/eclipse.jdt.core/commit/e60c4f1f36f7efd5fbc1bbc661872b78c6939230#diff-e517e5944661053f0fcff49d9432b74e

It’s just a single line:

How do we do this?

First off, go to the Eclipse Packages Download page:

http://www.eclipse.org/downloads/eclipse-packages

And download the “Eclipse IDE for Eclipse Committers” distribution:

It will contain all the Eclipse source code, which we’ll need to compile the above class. In the new workspace, create a new empty plugin project:

Specify the correct execution environment (in our case Java 10) and add all the Java Development Tools (JDT) dependencies:

Or just add all the available dependencies, it doesn’t really matter.

You can now open the type that you want to edit:

Now, simply copy the source code from the editor and paste it in a new class inside of your project, which you put in the same package as the original (split packages are still possible in this case, yay)

Inside of your copy, apply the desired patch and build the project. Since you already included all the dependencies, it will be easy to compile your copy of the class, and you don’t have to build the entirety of Eclipse.

Now, go to your Windows Explorer or Mac OS X Finder, or Linux shell or whatever and find the compiled class:

This class can now be copied into the Eclipse plugin. How to find the appropriate Eclipse plugin? Just go to your plugin dependencies and check out the location of the class you’ve opened earlier:

Open that plugin from your Eclipse distribution’s /plugins folder using 7zip or whatever zipping tool you prefer, and overwrite the original class file(s). You may need to close Eclipse first, before you can write to the plugin zip file. And it’s always a good idea to make backup copies of the original plugin(s).

Be careful that if your class has any nested classes, you will need to copy them all, e.g.

MyClass.class
MyClass$1.class // Anonymous class
MyClass$Nested.class // Named, nested class

Restart Eclipse, and your bug should be fixed!

How to fix my own bugs?

You may not always be lucky to find a bug with an existing fix in the bug tracker as in the second case:
https://bugs.eclipse.org/bugs/show_bug.cgi?id=535927

No problemo, we can hack our way around that as well. Launch your normal Eclipse instance (not the “Eclipse IDE for Eclipse Committers” one) with a debug agent running, by adding the following lines to your eclipse.ini file:

-Xdebug 
-Xnoagent 
-Djava.compile=NONE 
-Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5005

Launch Eclipse again, then connect to your Eclipse from your other “Eclipse IDE for Eclipse Committers” instance by connecting a debugger:

And start setting breakpoints wherever you need, e.g. here, in my case:

java.lang.NullPointerException
	at org.eclipse.jdt.internal.compiler.problem.ProblemHandler.handle(ProblemHandler.java:145)
	at org.eclipse.jdt.internal.compiler.problem.ProblemHandler.handle(ProblemHandler.java:226)
	at org.eclipse.jdt.internal.compiler.problem.ProblemReporter.handle(ProblemReporter.java:2513)
	at org.eclipse.jdt.internal.compiler.problem.ProblemReporter.deprecatedType(ProblemReporter.java:1831)
	at org.eclipse.jdt.internal.compiler.problem.ProblemReporter.deprecatedType(ProblemReporter.java:1808)
	at org.eclipse.jdt.internal.compiler.lookup.CompilationUnitScope.checkAndRecordImportBinding(CompilationUnitScope.java:960)
	at org.eclipse.jdt.internal.compiler.lookup.CompilationUnitScope.faultInImports(CompilationUnitScope.java:471)
	at org.eclipse.jdt.internal.compiler.lookup.CompilationUnitScope.faultInTypes(CompilationUnitScope.java:501)
	at org.eclipse.jdt.internal.compiler.Compiler.process(Compiler.java:878)
	at org.eclipse.jdt.internal.compiler.ProcessTaskManager.run(ProcessTaskManager.java:141)
	at java.lang.Thread.run(Unknown Source)

And start analysing the problem like your own bugs. The nice thing is, you don’t have to fix the problem, just find it, and possibly comment out some lines of code if you think they’re not really needed. In my case, luckily, the regression was introduced by a new method that is applied to JDK 9+ projects only:

String deprecatedSinceValue(Supplier<AnnotationBinding[]> annotations) {
    // ...
}

The method will check for the new @Deprecated(since="9") attribute on the @Deprecated annotation. Not an essential feature, so let’s just turn it off by adding this line to the source file:

String deprecatedSinceValue(Supplier<AnnotationBinding[]> annotations) {
    if (true) return;
    // ...
}

This will effectively prevent the faulty logic from ever running. Not a fix, but a workaround. For more details about this specific issue, see the report. Of course, never forget to actually report the issue to Eclipse (or whatever your IDE is), so it can be fixed thoroughly for everyone else as well

Compile. Patch. Restart. Done!

Conclusion

Java is a cool platform. It has always been a very dynamic language at runtime, where compiled class files can be replaced by new versions at any moment, and re-loaded by the class loaders. This makes patching code by other vendors very easy, just:

  • Create a project containing the vendors’ code (or if you don’t have the code, the binaries)
  • Apply a fix / workaround to the Java class that is faulty (or if you don’t have the code, decompile the binaries if you are allowed to)
  • Compile your own version
  • Replace the version of the class file from the vendor by yours
  • Restart

This works with all software, including IDEs. In the case of jOOQ, all our customers have the right to modification, and they get the sources as well. We know how useful it is to be able to patch someone else’s code. This article shows it. Now, I can continue modularising jOOQ, and as a side product, improve the tool chain for everybody else as well.

Again, this article displayed a QUICK FIX approach (some call it “hack”). There are more thorough ways to apply patches / fixes, and contribute them back to the vendor.

Another, very interesting option would be to instrument your runtime and apply the fix only to byte code:

And:

https://www.sitepoint.com/fixing-bugs-in-running-java-code-with-dynamic-attach/

A note on IntelliJ and NetBeans

Again, I haven’t tried NetBeans yet (although I’ve heard its Java 9 support has been working very well for quite a while).

While IntelliJ’s Jigsaw support seems more advanced than Eclipse’s (still with a few flaws as well), it currently has a couple of performance issues when compiling projects like jOOQ or jOOλ. In a future blog post, I will show how to “fix” those by using a profiler, like:

  • Java Mission Control (can be used as a profiler, too)
  • YourKit
  • JProfiler

Profilers can be used to very easily track down the main source of a performance problem. I’ve reported a ton to Eclipse already. For instance, this one:

https://bugs.eclipse.org/bugs/show_bug.cgi?id=474686

Where a lot of time is being spent in the processing of Task Tags, like:

  • TODO
  • FIXME
  • XXX

The great thing about profiling this is:

  • You can report a precise bug to the vendor
  • You can find the flawed feature and turn it off as a workaround. Turning off the above task tag feature was a no-brainer. I’m not even using the feature.

So, stay tuned for another blog post, soon.

Truth First, or Why You Should Mostly Implement Database First Designs

In this much overdue article, I will explain why I think that in almost all cases, you should implement a “database first” design in your application’s data models, rather than a “Java first” design (or whatever your client language is), the latter approach leading to a long road of pain and suffering, once your project grows.

This article is inspired by a recent Stack Overflow question.

Interesting reddit discussions on /r/java and /r/programming.

Code generation

To my surprise, a small group of first time jOOQ users seem to be appalled by the fact that jOOQ heavily relies on source code generation. No one keeps you from using jOOQ the way you want and you don’t have to use code generation, but the default way to use jOOQ according to the manual is to start with a (legacy) database schema, reverse engineer that using jOOQ’s code generator to get a bunch of classes representing your tables, and then to write type safe queries against those tables:

for (Record2<String, String> record : DSL.using(configuration)
//   ^^^^^^^^^^^^^^^^^^^^^^^ Type information derived from the 
//   generated code referenced from the below SELECT clause

       .select(ACTOR.FIRST_NAME, ACTOR.LAST_NAME)
//           vvvvv ^^^^^^^^^^^^  ^^^^^^^^^^^^^^^ Generated names
       .from(ACTOR)
       .orderBy(1, 2)) {
    // ...
}

The code is generated either manually outside of the build, or automatically with every build. For instance, such a re-generation could follow immediately after a Flyway database migration, which can also be run either manually or automatically.

Source code generation

There are different philosophies, advantages, and disadvantages regarding these manual/automatic approaches, which I don’t want to discuss in this article. But essentially, the point of generated code is that it provides a Java representation of something that we take for granted (a “truth”) either within or outside of our system. In a way, compilers do the same thing when they generate byte code, machine code, or some other type of source code from the original sources – we get a representation of our “truth” in a different language, for whatever reason.

There are many such code generators out there. For instance, XJC can generate Java code from XSD or WSDL files. The principle is always the same:

  • There is some truth (internal or external), like a specification, data model, etc.
  • We need a local representation of that truth in our programming language

And it almost always makes sense to generate that latter, to avoid redundancy.

Type providers and annotation processing

Noteworthy: Another, more modern approach to jOOQ’s particular code generation use-case would be Type Providers, as implemented by F#, in case of which the code is generated by the compiler while compiling. It never really exists in its source form. A similar (but less sophisticated) tool in Java are annotation processors, e.g. Lombok.

In a way, this does the same thing except:

  • You don’t see the generated code (perhaps that’s less appalling to some?)
  • You must ensure the types can be provided, i.e. the “truth” must always be available. Easy in the case of Lombok, which annotates the “truth”. A bit more difficult with database models, which rely on an always available live connection.

What’s the problem with code generation?

Apart from the tricky question whether to trigger code generation manually or automatically, some people seem to think that code must not be generated at all. The reason I hear the most is the idea that it is difficult to set up in a build pipeline. And yes, that is true. There is extra infrastructure overhead. Especially if you’re new to a certain product (like jOOQ, or JAXB, or Hibernate, etc.), setting up an environment takes time you would rather spend learning the API itself and getting value out of it.

If the overhead of learning how the code generator works is too high, then indeed, the API failed to make the code generator easy to use (and later on, to customise). That should be a high priority for any such API. But that’s the only argument against code generation. Other than that, it makes absolutely no sense at all to hand-write the local representation of the internal or external truth.

Many people argue that they don’t have time for that stuff. They need to ship their MVPs. They can finalise their build pipelines later. I say:

“But Hibernate / JPA makes coding Java first easy”

Yes that’s true. And it’s both a bliss and a curse for Hibernate and its users. In Hibernate, you can just write a couple of entities, such as:

@Entity
class Book {
  @Id
  int id;
  String title;
}

And you’re almost set. Let Hibernate generate the boring “details” of how to define this entity in your SQL dialect’s DDL:

CREATE TABLE book (
  id INTEGER PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
  title VARCHAR(50),

  CONSTRAINT pk_book PRIMARY KEY (id)
);

CREATE INDEX i_book_title ON book (title);

… and start running the application. That’s really cool to get started quickly and to try out things.

But, huh, wait. I cheated.

  • Will Hibernate really apply that named primary key definition?
  • Will it create the index on TITLE, which I know we’ll need?
  • Will it add an identity specification?

Probably not. While you’re developing your greenfield project, it is convenient to always throw away your entire database and re-generate it from scratch, once you’ve added the additional annotations. So, the Book entity would eventually look like this:

@Entity
@Table(name = "book", indexes = {
  @Index(name = "i_book_title", columnList = "title")
})
class Book {
  @Id
  @GeneratedValue(strategy = IDENTITY)
  int id;
  String title;
}

Cool. Re-generate. Again, this makes it really easy to get started.

But you’ll pay the price later on

At some point, you go to production. And that’s when this model no longer works. Because

Once you go live, you can no longer throw away your database, as your database has become legacy.

From now on, you have to write DDL migration scripts, e.g. using Flyway. And then, what happens to your entities? You can either adapt them manually (so you double the work), or have Hibernate re-generate them for you (how big are your chances of the generation matching your expectations?) You can only lose.

Because once you go to production, you need hotfixes. And those have to go live fast. And since you didn’t prepare for pipelining your migrations to production smoothly, you’ll patch things wildly. And then you run out of time to do it right™. And you’ll blame Hibernate, because it’s always someone else’s fault…

Instead, you could have done things entirely differently from the beginning. Like using those round wheels.

Go Database First

The real “truth” of your database schema, and the “sovereignty” over it, resides with your database. The database is the only place where the schema is defined, and all clients have a copy of the database schema, not vice versa. The data is in your database, not in your client, so it makes perfect sense to enforce the schema and its integrity in the database, right where the data is.

This is old wisdom, nothing new. Primary and unique keys are good. Foreign keys are good. Check constraints are good. Assertions (when they’re finally implemented) are good.

And that’s not where it ends. For instance, if you’re using Oracle, you may want to specify:

  • In what tablespace your table resides
  • What PCTFREE value it has
  • What the cache size of your sequence (behind the identity) is

Maybe, all of this doesn’t matter in small systems, but you don’t have to go “big data” before you can profit from vendor-specific storage optimisations as the above. None of the ORMs I’ve ever seen (including jOOQ) will allow you to use the full set of DDL options that you may want to use on your database. ORMs offer some tools to help you write DDL.

But ultimately, a well-designed schema is hand written in DDL. All generated DDL is only an approximation of that.

What about the client model?

As mentioned before, you will need a copy of your database schema in your client, a client representation. Needless to say that this client representation needs to be in-sync with the real model. How to best do that? By using a code generator.

All databases expose their meta information through SQL. Here’s how to get all tables from your database in various SQL dialects:

-- H2, HSQLDB, MySQL, PostgreSQL, SQL Server
SELECT table_schema, table_name
FROM information_schema.tables

-- DB2
SELECT tabschema, tabname
FROM syscat.tables

-- Oracle
SELECT owner, table_name
FROM all_tables

-- SQLite
SELECT name
FROM sqlite_master

-- Teradata
SELECT databasename, tablename
FROM dbc.tables

These queries (or similar ones, e.g. depending on whether views, materialised views, table valued functions should also be considered) are also run by JDBC’s DatabaseMetaData.getTables() call, or by the jOOQ-meta module.

From the result of such queries, it’s relatively easy to generate any client representation of your database model, regardless what your client technology is.

  • If you’re using JDBC or Spring, you can create a bunch of String constants
  • If you’re using JPA, you can generate the entities themselves
  • If you’re using jOOQ, you can generate the jOOQ meta model

Depending on the amount of features your client API offers (e.g. jOOQ or JPA), the generated meta model can be really rich and complete. Consider, for instance, jOOQ 3.11’s implicit join feature, which relies on generated meta information about the foreign key relationships between your tables.

Now, any database increment will automatically lead to updated client code. For instance, imagine:

ALTER TABLE book RENAME COLUMN title TO book_title;

Would you really want to do this work twice? No way. Just commit the DDL, run it through your build pipeline, and have an updated entity:

@Entity
@Table(name = "book", indexes = {

  // Would you have thought of this?
  @Index(name = "i_book_title", columnList = "book_title")
})
class Book {
  @Id
  @GeneratedValue(strategy = IDENTITY)
  int id;

  @Column("book_title")
  String bookTitle;
}

Or an updated jOOQ class. Plus: Your client code might no longer compile, which can be a good thing! Most DDL changes are also semantic changes, not just syntactic ones. So, it’s great to be able to see in compiled client source code, what code is (or may be) affected by your database increment.

A single truth

Regardless what technology you’re using, there’s always one model that contains the single truth for a subsystem – or at least, we should aim for this goal and avoid the enterprisey mess where “truth” is everywhere and nowhere. It just makes everything much simpler. If you exchange XML files with some other system, you’re going to use XSD. Like jOOQ’s INFORMATION_SCHEMA meta model in XML form:
https://www.jooq.org/xsd/jooq-meta-3.10.0.xsd

  • XSD is well understood
  • XSD specifies XML content very well, and allows for validation in all client languages
  • XSD can be versioned easily, and evolved backwards compatibly
  • XSD can be translated to Java code using XJC

The last bullet is important. When communicating with an external system through XML messages, we want to be sure our messages are valid. That’s really really easy to do with JAXB, XJC, and XSD. It would be outright nuts to think that a Java-first approach where we design our messages as Java objects could somehow be reasonably mapped to XML for someone else to consume. That generated XML would be of very poor quality, undocumented, and hard to evolve. If there’s an SLA on such an interface, we’d be screwed.

Frankly, that’s what happens to JSON APIs all the time, but that’s another story, another rant…

Databases: Same thing

When you’re using databases, it’s the same thing. The database owns its data and it should be the master of the schema. All modifications to the schema should be implemented using DDL directly, to update the single truth.

Once that truth is updated, all clients need to update their copies of the model as well. Some clients may be written in Java, using either (or both of) jOOQ and Hibernate, or JDBC. Other clients may be written in Perl (good luck to them). Even other clients may be written in C#. It doesn’t matter. The main model is in the database. ORM-generated models are of poor quality, not well documented, and hard to evolve.

So, don’t do it. And, don’t do it from the very beginning. Instead, go database first. Build a deployment pipeline that can be automated. Include code generators to copy your database model back into the clients. And stop worrying about code generation. It’s a good thing. You’ll be productive. All it takes is a bit of initial effort to set it up, and you’ll get years of improved productivity for the rest of your project.

Thank me later.

Clarification

Just to be sure: This article in no way asserts that your database model should be imposed on your entire system (e.g. your domain, your business logic, etc. etc.). The claim I made here is that client code interacting with the database should act upon the database model, and not have its own first class model of the database instead. This logic typically resides in the data access layer of your client.

In 2-tier architectures, which still have their place sometimes, that may be the only model of your system. In most systems, however, I consider the data access layer a “subsystem” that encapsulates the database model. So, there.

Exceptions

There are always exceptions, and I promised that the database first and code generation approach may not always be the right choice. These exceptions are (probably not exhaustive):

  • When the schema is unknown and must be discovered. E.g. you’re a tool vendor helping users navigate any schema. Duh… No code generation. But still database first.
  • When the schema needs to be generated on the fly for some task. This sounds a lot like a more or less sophisticated version of the entity attribute value pattern, i.e. you don’t really have a well-defined schema. In that case, it’s often not even sure if an RDBMS will be the right choice.

The nature of exceptions is that they’re exceptional. In the majority of RDBMS usage, the schema is known in advance, placed inside the RDBMS as the single source of “truth”, and clients will have derived copies from it – ideally generated using a code generator.

How to Run a Bulk INSERT .. RETURNING Statement With Oracle and JDBC

When inserting records into SQL databases, we often want to fetch back generated IDs and possibly other trigger, sequence, or default generated values. Let’s assume we have the following table:

-- DB2
CREATE TABLE x (
  i INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, 
  j VARCHAR(50), 
  k DATE DEFAULT CURRENT_DATE
);

-- PostgreSQL
CREATE TABLE x (
  i SERIAL4 PRIMARY KEY, 
  j VARCHAR(50), 
  k DATE DEFAULT CURRENT_DATE
);

-- Oracle
CREATE TABLE x (
  i INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, 
  j VARCHAR2(50), 
  k DATE DEFAULT SYSDATE
);

DB2

DB2 is the only database currently supported by jOOQ, which implements the SQL standard according to which we can SELECT from any INSERT statement, including:

SELECT *
FROM FINAL TABLE (
  INSERT INTO x (j)
  VALUES ('a'), ('b'), ('c')
);

The above query returns:

I |J |K          |
--|--|-----------|
1 |a |2018-05-02 |
2 |b |2018-05-02 |
3 |c |2018-05-02 |

Pretty neat! This query can simply be run like any other query in JDBC, and you don’t have to go through any hassles.

PostgreSQL and Firebird

These databases have a vendor specific extension that does the same thing, almost as powerful:

-- Simple INSERT .. RETURNING query
INSERT INTO x (j)
VALUES ('a'), ('b'), ('c')
RETURNING *;

-- If you want to do more fancy stuff
WITH t AS (
  INSERT INTO x (j)
  VALUES ('a'), ('b'), ('c')
  RETURNING *
)
SELECT * FROM t;

Both syntaxes work equally well, the latter is just as powerful as DB2’s, where the result of an insertion (or update, delete, merge) can be joined to other tables. Again, no problem with JDBC

Oracle

In Oracle, this is a bit more tricky. The Oracle SQL language doesn’t have an equivalent of DB2’s FINAL TABLE (DML statement). The Oracle PL/SQL language, however, does support the same syntax as PostgreSQL and Firebird. This is perfectly valid PL/SQL

-- Create a few auxiliary types first
CREATE TYPE t_i AS TABLE OF NUMBER(38);
/
CREATE TYPE t_j AS TABLE OF VARCHAR2(50);
/
CREATE TYPE t_k AS TABLE OF DATE;
/

DECLARE 
  -- These are the input values
  in_j t_j := t_j('a', 'b', 'c');
  
  out_i t_i;
  out_j t_j;
  out_k t_k;
  
  c1 SYS_REFCURSOR;
  c2 SYS_REFCURSOR;
  c3 SYS_REFCURSOR;
BEGIN

  -- Use PL/SQL's FORALL command to bulk insert the
  -- input array type and bulk return the results
  FORALL i IN 1 .. in_j.COUNT
    INSERT INTO x (j)
    VALUES (in_j(i))
    RETURNING i, j, k
    BULK COLLECT INTO out_i, out_j, out_k;
  
  -- Fetch the results and display them to the console
  OPEN c1 FOR SELECT * FROM TABLE(out_i);  
  OPEN c2 FOR SELECT * FROM TABLE(out_j);  
  OPEN c3 FOR SELECT * FROM TABLE(out_k); 
  
  dbms_sql.return_result(c1);
  dbms_sql.return_result(c2);
  dbms_sql.return_result(c3);
END;
/

A bit verbose, but it has the same effect. Now, from JDBC:

try (Connection con = DriverManager.getConnection(url, props);
    Statement s = con.createStatement();

    // The statement itself is much more simple as we can
    // use OUT parameters to collect results into, so no
    // auxiliary local variables and cursors are needed
    CallableStatement c = con.prepareCall(
        "DECLARE "
      + "  v_j t_j := ?; "
      + "BEGIN "
      + "  FORALL j IN 1 .. v_j.COUNT "
      + "    INSERT INTO x (j) VALUES (v_j(j)) "
      + "    RETURNING i, j, k "
      + "    BULK COLLECT INTO ?, ?, ?; "
      + "END;")) {

    try {

        // Create the table and the auxiliary types
        s.execute(
            "CREATE TABLE x ("
          + "  i INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,"
          + "  j VARCHAR2(50),"
          + "  k DATE DEFAULT SYSDATE"
          + ")");
        s.execute("CREATE TYPE t_i AS TABLE OF NUMBER(38)");
        s.execute("CREATE TYPE t_j AS TABLE OF VARCHAR2(50)");
        s.execute("CREATE TYPE t_k AS TABLE OF DATE");

        // Bind input and output arrays
        c.setArray(1, ((OracleConnection) con).createARRAY(
            "T_J", new String[] { "a", "b", "c" })
        );
        c.registerOutParameter(2, Types.ARRAY, "T_I");
        c.registerOutParameter(3, Types.ARRAY, "T_J");
        c.registerOutParameter(4, Types.ARRAY, "T_K");

        // Execute, fetch, and display output arrays
        c.execute();
        Object[] i = (Object[]) c.getArray(2).getArray();
        Object[] j = (Object[]) c.getArray(3).getArray();
        Object[] k = (Object[]) c.getArray(4).getArray();

        System.out.println(Arrays.asList(i));
        System.out.println(Arrays.asList(j));
        System.out.println(Arrays.asList(k));
    }
    finally {
        try {
            s.execute("DROP TYPE t_i");
            s.execute("DROP TYPE t_j");
            s.execute("DROP TYPE t_k");
            s.execute("DROP TABLE x");
        }
        catch (SQLException ignore) {}
    }
}

The above code will display:

[1, 2, 3]
[a, b, c]
[2018-05-02 10:40:34.0, 2018-05-02 10:40:34.0, 2018-05-02 10:40:34.0]

Exactly what we wanted.

jOOQ support

A future version of will emulate the above PL/SQL block from the jOOQ INSERT .. RETURNING statement:

DSL.using(configuration)
   .insertInto(X)
   .columns(X.J)
   .values("a")
   .values("b")
   .values("c")
   .returning(X.I, X.J, X.K)
   .fetch();

This will correctly emulate the query for all of the databases that natively support the syntax. In the case of Oracle, since jOOQ cannot create nor assume any SQL TABLE types, PL/SQL types from the DBMS_SQL package will be used

The relevant issue is here: https://github.com/jOOQ/jOOQ/issues/5863

The Performance Difference Between SQL Row-by-row Updating, Batch Updating, and Bulk Updating

Something that has been said many times, but needs constant repeating until every developer is aware of the importance of this is the performance difference between row-by-row updating and bulk updating. If you cannot guess which one will be much faster, remember that row-by-row kinda rhymes with slow-by-slow (hint hint).

Disclaimer: This article will discuss only non-concurrent updates, which are much easier to reason about. In a concurrent update situation, a lot of additional factors will add complexity to the problem, including the locking strategy, transaction isolation levels, or simply how the database vendor implements things in detail. For the sake of simplicity, I’ll assume no concurrent updates are being made.

Example query

Let’s say we have a simple table for our blog posts (using Oracle syntax, but the effect is the same on all databases):

CREATE TABLE post (
  id INT NOT NULL PRIMARY KEY,
  text VARCHAR2(1000) NOT NULL,
  archived NUMBER(1) NOT NULL CHECK (archived IN (0, 1)),
  creation_date DATE NOT NULL
);

CREATE INDEX post_creation_date_i ON post (creation_date);

Now, let’s add some 10000 rows:

INSERT INTO post
SELECT 
  level,
  lpad('a', 1000, 'a'),
  0 AS archived,
  DATE '2017-01-01' + (level / 100)
FROM dual
CONNECT BY level <= 10000;

EXEC dbms_stats.gather_table_stats('TEST', 'POST');

Now imagine, we want to update this table and set all posts to ARCHIVED = 1 if they are from last year, e.g. CREATION_DATE < DATE '2018-01-01'. There are various ways to do this, but you should have built an intuition that doing the update in one single UPDATE statement is probably better than looping over each individual row and updating each individual row explicitly. Right?

Right.

Then, why do we keep doing it?

Let me ask this differently:

Does it matter?

The best way to find out is to benchmark. I’m doing two benchmarks for this:

  1. One that is run in PL/SQL, showing the performance difference between different approaches that are available to PL/SQL (namely looping, the FORALL syntax, and a single bulk UPDATE)
  2. One that is run in Java, doing JDBC calls, showing the performance difference between different approaches available to Java (namely looping, caching PreparedStatement but still looping, batching, and a single bulk UPDATE)

Benchmarking PL/SQL

The code of the benchmark can be found in this gist. I will also include it at the bottom of this blog post. The results are:

Run 1, Statement 1 : .01457 (avg : .0098)
Run 1, Statement 2 : .0133  (avg : .01291)
Run 1, Statement 3 : .02351 (avg : .02519)
Run 2, Statement 1 : .00882 (avg : .0098)
Run 2, Statement 2 : .01159 (avg : .01291)
Run 2, Statement 3 : .02348 (avg : .02519)
Run 3, Statement 1 : .01012 (avg : .0098)
Run 3, Statement 2 : .01453 (avg : .01291)
Run 3, Statement 3 : .02544 (avg : .02519)
Run 4, Statement 1 : .00799 (avg : .0098)
Run 4, Statement 2 : .01346 (avg : .01291)
Run 4, Statement 3 : .02958 (avg : .02519)
Run 5, Statement 1 : .00749 (avg : .0098)
Run 5, Statement 2 : .01166 (avg : .01291)
Run 5, Statement 3 : .02396 (avg : .02519)

The difference between Statement 1 and 3 is a factor of 2.5x

Showing the time it takes for each statement type to complete, each time updating 3649 / 10000 rows. The winner is:

Statement 1, running a bulk update

It looks like this:

UPDATE post
SET archived = 1
WHERE archived = 0 AND creation_date < DATE '2018-01-01';

Runner-up (not too far away) is:

Statement 2, using the PL/SQL FORALL syntax

It works like this:

DECLARE
  TYPE post_ids_t IS TABLE OF post.id%TYPE;
  v_post_ids post_ids_t;
BEGIN
  SELECT id 
  BULK COLLECT INTO v_post_ids
  FROM post 
  WHERE archived = 0 AND creation_date < DATE '2018-01-01';

  FORALL i IN 1 .. v_post_ids.count
    UPDATE post
    SET archived = 1
    WHERE id = v_post_ids(i);
END;

Loser (by a factor of 2.5x on our specific data set) is:

Statement 3, using an ordinary LOOP and running row-by-row updates

FOR rec IN (
  SELECT id 
  FROM post 
  WHERE archived = 0 AND creation_date < DATE '2018-01-01'
) LOOP
  UPDATE post
  SET archived = 1
  WHERE id = rec.id;
END LOOP;

It does not really come as a surprise. We’re switching between the PL/SQL engine and the SQL engine many many times, and also, instead of running through the post table only once in O(N) time, we’re looking up individual ID values in O(log N) time, N times, so the complexity went from

O(N) -> O(N log N)

We’d get far worse results for larger tables!

What about doing this from Java?

The difference is much more drastic if each call to the SQL engine has to be done over the network from another process. Again, the benchmark code is available from a gist, and I will paste it to the end of this blog post as well.

The result is (same time unit):

Run 0, Statement 1: PT4.546S
Run 0, Statement 2: PT3.52S
Run 0, Statement 3: PT0.144S
Run 0, Statement 4: PT0.028S
Run 1, Statement 1: PT3.712S
Run 1, Statement 2: PT3.185S
Run 1, Statement 3: PT0.138S
Run 1, Statement 4: PT0.025S
Run 2, Statement 1: PT3.481S
Run 2, Statement 2: PT3.007S
Run 2, Statement 3: PT0.122S
Run 2, Statement 4: PT0.026S
Run 3, Statement 1: PT3.518S
Run 3, Statement 2: PT3.077S
Run 3, Statement 3: PT0.113S
Run 3, Statement 4: PT0.027S
Run 4, Statement 1: PT3.54S
Run 4, Statement 2: PT2.94S
Run 4, Statement 3: PT0.123S
Run 4, Statement 4: PT0.03S

The difference between Statement 1 and 4 is a factor of 100x !!

So, who’s winning? Again (by far):

Statement 4, running the bulk update

In fact, the time is not too far away from the time taken by PL/SQL. With larger data sets being updated, the two results will converge. The code is:

try (Statement s = c.createStatement()) {
    s.executeUpdate(
        "UPDATE post\n" +
        "SET archived = 1\n" +
        "WHERE archived = 0\n" +
        "AND creation_date < DATE '2018-01-01'\n");
}

Followed by the not that much worse (but still 3.5x worse):

Statement 3, running the batch update

Batching can be compared to PL/SQL’s FORALL statement. While we’re running individual row-by-row updates, we’re sending all the update statements in one batch to the SQL engine. This does save a lot of time on the network and all the layers in between.

The code looks like this:

try (Statement s = c.createStatement();
    ResultSet rs = s.executeQuery(
        "SELECT id FROM post WHERE archived = 0\n"
      + "AND creation_date < DATE '2018-01-01'"
    );
    PreparedStatement u = c.prepareStatement(
        "UPDATE post SET archived = 1 WHERE id = ?"
    )) {

    while (rs.next()) {
        u.setInt(1, rs.getInt(1));
        u.addBatch();
    }

    u.executeBatch();
}

Followed by the losers:

Statement 1 and 2, running row by row updates

The difference between statement 1 and 2 is that 2 caches the PreparedStatement, which allows for reusing some resources. This can be a good thing, but didn’t have a very significant effect in our case, compared to the batch / bulk alternatives. The code is:

// Statement 1:
try (Statement s = c.createStatement();
    ResultSet rs = s.executeQuery(
        "SELECT id FROM post\n"
      + "WHERE archived = 0\n"
      + "AND creation_date < DATE '2018-01-01'"
    )) {

    while (rs.next()) {
        try (PreparedStatement u = c.prepareStatement(
            "UPDATE post SET archived = 1 WHERE id = ?"
        )) {
            u.setInt(1, rs.getInt(1));
            u.executeUpdate();
        }
    }
}

// Statement 2:
try (Statement s = c.createStatement();
    ResultSet rs = s.executeQuery(
        "SELECT id FROM post\n"
      + "WHERE archived = 0\n"
      + "AND creation_date < DATE '2018-01-01'"
    );
    PreparedStatement u = c.prepareStatement(
        "UPDATE post SET archived = 1 WHERE id = ?"
    )) {

    while (rs.next()) {
        u.setInt(1, rs.getInt(1));
        u.executeUpdate();
    }
}

Conclusion

As shown previously on this blog, there is a significant cost of JDBC server roundtrips, which can be seen in the JDBC benchmark. This cost is much more severe if we unnecessarily create many server roundtrips for a task that could be done in a single roundtrip, namely by using a SQL bulk UPDATE statement.

This is not only true for updates, but also for all the other statements, including SELECT, DELETE, INSERT, and MERGE. If doing everything in a single statement isn’t possible due to the limitations of SQL, we can still save roundtrips by grouping statements in a block, either by using an anonymous block in databases that support them:

BEGIN
  statement1;
  statement2;
  statement3;
END;

(you can easily send these anonymous blocks over JDBC, as well!)

Or, by emulating anonymous blocks using the JDBC batch API (has its limitations), or by writing stored procedures.

The performance gain is not always worth the trouble of moving logic from the client to the server, but very often (as in the above case), the move is a no-brainer and there’s absolutely no reason against it.

So, remember: Stop doing row-by-row (slow-by-slow) operations when you could run the same operation in bulk, in a single SQL statement.

Hint: Always know what your ORM (if you’re using one) is doing, because the ORM can help you with automatic batching / bulking in many cases. But it often cannot, or it is too difficult to make it do so, so resorting to SQL is the way to go.

Code

PL/SQL benchmark

SET SERVEROUTPUT ON

DROP TABLE post;

CREATE TABLE post (
  id INT NOT NULL PRIMARY KEY,
  text VARCHAR2(1000) NOT NULL,
  archived NUMBER(1) NOT NULL CHECK (archived IN (0, 1)),
  creation_date DATE NOT NULL
);

CREATE INDEX post_creation_date_i ON post (creation_date);

ALTER SYSTEM FLUSH SHARED_POOL;
ALTER SYSTEM FLUSH BUFFER_CACHE;

CREATE TABLE results (
  run     NUMBER(2),
  stmt    NUMBER(2),
  elapsed NUMBER
);

DECLARE
  v_ts TIMESTAMP WITH TIME ZONE;
  
  PROCEDURE reset_post IS
  BEGIN
    EXECUTE IMMEDIATE 'TRUNCATE TABLE post';
    INSERT INTO post
    SELECT 
      level AS id,
      lpad('a', 1000, 'a') AS text,
      0 AS archived,
      DATE '2017-01-01' + (level / 100) AS creation_date
    FROM dual
    CONNECT BY level <= 10000;
    dbms_stats.gather_table_stats('TEST', 'POST');
  END reset_post;
BEGIN

  -- Repeat the whole benchmark several times to avoid warmup penalty
  FOR r IN 1..5 LOOP
  
    reset_post;
    v_ts := SYSTIMESTAMP;
    
    UPDATE post
    SET archived = 1
    WHERE archived = 0 AND creation_date < DATE '2018-01-01';
  
    INSERT INTO results VALUES (r, 1, SYSDATE + ((SYSTIMESTAMP - v_ts) * 86400) - SYSDATE);
    
    reset_post;
    v_ts := SYSTIMESTAMP;
    
    DECLARE
      TYPE post_ids_t IS TABLE OF post.id%TYPE;
      v_post_ids post_ids_t;
    BEGIN
      SELECT id 
      BULK COLLECT INTO v_post_ids
      FROM post 
      WHERE archived = 0 AND creation_date < DATE '2018-01-01';
    
      FORALL i IN 1 .. v_post_ids.count
        UPDATE post
        SET archived = 1
        WHERE id = v_post_ids(i);
    END;
    
    INSERT INTO results VALUES (r, 2, SYSDATE + ((SYSTIMESTAMP - v_ts) * 86400) - SYSDATE);
    
    reset_post;
    v_ts := SYSTIMESTAMP;
      
    FOR rec IN (
      SELECT id 
      FROM post 
      WHERE archived = 0 AND creation_date < DATE '2018-01-01'
    ) LOOP
      UPDATE post
      SET archived = 1
      WHERE id = rec.id;
    END LOOP;
      
    INSERT INTO results VALUES (r, 3, SYSDATE + ((SYSTIMESTAMP - v_ts) * 86400) - SYSDATE);
  END LOOP;
  
  FOR rec IN (
    SELECT 
      run, stmt, 
      CAST(elapsed AS NUMBER(10, 5)) ratio,
      CAST(AVG(elapsed) OVER (PARTITION BY stmt) AS NUMBER(10, 5)) avg_ratio
    FROM results
    ORDER BY run, stmt
  )
  LOOP
    dbms_output.put_line('Run ' || rec.run || 
      ', Statement ' || rec.stmt || 
      ' : ' || rec.ratio || ' (avg : ' || rec.avg_ratio || ')');
  END LOOP;
  
  dbms_output.put_line('');
  dbms_output.put_line('Copyright Data Geekery GmbH');
  dbms_output.put_line('https://www.jooq.org/benchmark');
END;
/

DROP TABLE results;

JDBC benchmark

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
import java.time.Duration;
import java.time.Instant;
import java.util.Properties;

public class OracleUpdate {

    public static void main(String[] args) throws Exception {
        Class.forName("oracle.jdbc.OracleDriver");

        String url = "jdbc:oracle:thin:@192.168.99.100:1521:ORCLCDB";
        String user = "TEST";
        String password = "TEST";

        Properties properties = new Properties();
        properties.setProperty("user", user);
        properties.setProperty("password", password);

        try (Connection c = DriverManager.getConnection(url, properties)) {
            for (int i = 0; i < 5; i++) {
                Instant ts;

                resetPost(c);
                ts = Instant.now();

                try (Statement s = c.createStatement();
                    ResultSet rs = s.executeQuery(
                        "SELECT id FROM post WHERE archived = 0 AND creation_date < DATE '2018-01-01'"
                    )) {

                    while (rs.next()) {
                        try (PreparedStatement u = c.prepareStatement(
                            "UPDATE post SET archived = 1 WHERE id = ?"
                        )) {
                            u.setInt(1, rs.getInt(1));
                            u.executeUpdate();
                        }
                    }
                }

                System.out.println("Run " + i + ", Statement 1: " + Duration.between(ts, Instant.now()));

                resetPost(c);
                ts = Instant.now();

                try (Statement s = c.createStatement();
                    ResultSet rs = s.executeQuery(
                        "SELECT id FROM post WHERE archived = 0 AND creation_date < DATE '2018-01-01'"
                    );
                    PreparedStatement u = c.prepareStatement(
                        "UPDATE post SET archived = 1 WHERE id = ?"
                    )) {

                    while (rs.next()) {
                        u.setInt(1, rs.getInt(1));
                        u.executeUpdate();
                    }
                }

                System.out.println("Run " + i + ", Statement 2: " + Duration.between(ts, Instant.now()));

                resetPost(c);
                ts = Instant.now();

                try (Statement s = c.createStatement();
                    ResultSet rs = s.executeQuery(
                        "SELECT id FROM post WHERE archived = 0 AND creation_date < DATE '2018-01-01'"
                    );
                    PreparedStatement u = c.prepareStatement(
                        "UPDATE post SET archived = 1 WHERE id = ?"
                    )) {

                    while (rs.next()) {
                        u.setInt(1, rs.getInt(1));
                        u.addBatch();
                    }

                    u.executeBatch();
                }
                System.out.println("Run " + i + ", Statement 3: " + Duration.between(ts, Instant.now()));

                resetPost(c);
                ts = Instant.now();

                try (Statement s = c.createStatement()) {
                    s.executeUpdate("UPDATE post\n" +
                        "SET archived = 1\n" +
                        "WHERE archived = 0 AND creation_date < DATE '2018-01-01'\n");
                }

                System.out.println("Run " + i + ", Statement 4: " + Duration.between(ts, Instant.now()));
            }
        }
    }

    static void resetPost(Connection c) throws SQLException {
        try (Statement s = c.createStatement()) {
            s.executeUpdate("TRUNCATE TABLE post");
            s.executeUpdate("INSERT INTO post\n" +
                "    SELECT \n" +
                "      level,\n" +
                "      lpad('a', 1000, 'a'),\n" +
                "      0,\n" +
                "      DATE '2017-01-01' + (level / 10)\n" +
                "    FROM dual\n" +
                "    CONNECT BY level <= 10000");
            s.executeUpdate("BEGIN dbms_stats.gather_table_stats('TEST', 'POST'); END;");
        }
    }
}

Correct Reflective Access to Interface Default Methods in Java 8, 9, 10

When performing reflective access to default methods in Java, Google seems to fail us. The solutions presented on Stack Overflow, for instance, seem to work only in a certain set of cases, and not on all Java versions.

This article will illustrate different approaches to calling interface default methods through reflection, as may be required by a proxy, for instance.

TL;DR If you’re impatient, all the access methods exposed in this blog are available in this gist, and the problem is also fixed in our library jOOR.

Proxying interfaces with default methods

The useful java.lang.reflect.Proxy API has been around for a while. We can do cool things like:

import java.lang.reflect.Proxy;

public class ProxyDemo {
    interface Duck {
        void quack();
    }

    public static void main(String[] a) {
        Duck duck = (Duck) Proxy.newProxyInstance(
            Thread.currentThread().getContextClassLoader(),
            new Class[] { Duck.class },
            (proxy, method, args) -> {
                System.out.println("Quack");
                return null;
            }
        );

        duck.quack();
    }
}

This just yields:

Quack

In this example, we create a proxy instance that implements the Duck API through an InvocationHandler, which is essentially just a lambda that gets called for each method call on Duck.

The interesting bit is when we want to have a default method on Duck and delegate the call to that default method:

interface Duck {
    default void quack() {
        System.out.println("Quack");
    }
}

We might be inclined to write this:

import java.lang.reflect.Proxy;

public class ProxyDemo {
    interface Duck {
        default void quack() {
            System.out.println("Quack");
        }
    }

    public static void main(String[] a) {
        Duck duck = (Duck) Proxy.newProxyInstance(
            Thread.currentThread().getContextClassLoader(),
            new Class[] { Duck.class },
            (proxy, method, args) -> {
                method.invoke(proxy);
                return null;
            }
        );

        duck.quack();
    }
}

But this will just generate a long long stack trace of nested exceptions (this isn’t specific to the method being a default method. You simply cannot do this):

Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
	at $Proxy0.quack(Unknown Source)
	at ProxyDemo.main(ProxyDemo.java:20)
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at ProxyDemo.lambda$0(ProxyDemo.java:15)
	... 2 more
Caused by: java.lang.reflect.UndeclaredThrowableException
	at $Proxy0.quack(Unknown Source)
	... 7 more
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at ProxyDemo.lambda$0(ProxyDemo.java:15)
	... 8 more
Caused by: java.lang.reflect.UndeclaredThrowableException
	at $Proxy0.quack(Unknown Source)
	... 13 more
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at ProxyDemo.lambda$0(ProxyDemo.java:15)
	... 14 more
Caused by: java.lang.reflect.UndeclaredThrowableException
	at $Proxy0.quack(Unknown Source)
	... 19 more
...
...
... goes on forever

Not very helpful.

Using method handles

So, the original Google search turned up results that indicate we need to use the MethodHandles API. Let’s try that, then!

import java.lang.invoke.MethodHandles;
import java.lang.reflect.Proxy;

public class ProxyDemo {
    interface Duck {
        default void quack() {
            System.out.println("Quack");
        }
    }

    public static void main(String[] a) {
        Duck duck = (Duck) Proxy.newProxyInstance(
            Thread.currentThread().getContextClassLoader(),
            new Class[] { Duck.class },
            (proxy, method, args) -> {
                MethodHandles
                    .lookup()
                    .in(Duck.class)
                    .unreflectSpecial(method, Duck.class)
                    .bindTo(proxy)
                    .invokeWithArguments();
                return null;
            }
        );

        duck.quack();
    }
}

That seems to work, cool!

Quack

… until it doesn’t.

Calling a default method on a non-private-accessible interface

The interface in the above example was carefully chosen to be “private-accessible” by the caller, i.e. the interface is nested in the caller’s class. What if we had a top-level interface?

import java.lang.invoke.MethodHandles;
import java.lang.reflect.Proxy;

interface Duck {
    default void quack() {
        System.out.println("Quack");
    }
}

public class ProxyDemo {
    public static void main(String[] a) {
        Duck duck = (Duck) Proxy.newProxyInstance(
            Thread.currentThread().getContextClassLoader(),
            new Class[] { Duck.class },
            (proxy, method, args) -> {
                MethodHandles
                    .lookup()
                    .in(Duck.class)
                    .unreflectSpecial(method, Duck.class)
                    .bindTo(proxy)
                    .invokeWithArguments();
                return null;
            }
        );

        duck.quack();
    }
}

The almost same code snippet no longer works. We get the following IllegalAccessException:

Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
	at $Proxy0.quack(Unknown Source)
	at ProxyDemo.main(ProxyDemo.java:26)
Caused by: java.lang.IllegalAccessException: no private access for invokespecial: interface Duck, from Duck/package
	at java.lang.invoke.MemberName.makeAccessException(MemberName.java:850)
	at java.lang.invoke.MethodHandles$Lookup.checkSpecialCaller(MethodHandles.java:1572)
	at java.lang.invoke.MethodHandles$Lookup.unreflectSpecial(MethodHandles.java:1231)
	at ProxyDemo.lambda$0(ProxyDemo.java:19)
	... 2 more

Bummer. When googling further, we might find the following solution, which accesses MethodHandles.Lookup‘s internals through reflection:

import java.lang.invoke.MethodHandles.Lookup;
import java.lang.reflect.Constructor;
import java.lang.reflect.Proxy;

interface Duck {
    default void quack() {
        System.out.println("Quack");
    }
}

public class ProxyDemo {
    public static void main(String[] a) {
        Duck duck = (Duck) Proxy.newProxyInstance(
            Thread.currentThread().getContextClassLoader(),
            new Class[] { Duck.class },
            (proxy, method, args) -> {
                Constructor<Lookup> constructor = Lookup.class
                    .getDeclaredConstructor(Class.class);
                constructor.setAccessible(true);
                constructor.newInstance(Duck.class)
                    .in(Duck.class)
                    .unreflectSpecial(method, Duck.class)
                    .bindTo(proxy)
                    .invokeWithArguments();
                return null;
            }
        );

        duck.quack();
    }
}

And yay, we get:

Quack

We get that on JDK 8. What about JDK 9 or 10?

WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by ProxyDemo (file:/C:/Users/lukas/workspace/playground/target/classes/) to constructor java.lang.invoke.MethodHandles$Lookup(java.lang.Class)
WARNING: Please consider reporting this to the maintainers of ProxyDemo
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Quack

Oops. That’s what happens by default. If we run the program with the --illegal-access=deny flag:

java --illegal-access=deny ProxyDemo

Then, we’re getting (and rightfully so):

Exception in thread "main" java.lang.reflect.InaccessibleObjectException: Unable to make java.lang.invoke.MethodHandles$Lookup(java.lang.Class) accessible: module java.base does not "opens java.lang.invoke" to unnamed module @357246de
        at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:337)
        at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:281)
        at java.base/java.lang.reflect.Constructor.checkCanSetAccessible(Constructor.java:192)
        at java.base/java.lang.reflect.Constructor.setAccessible(Constructor.java:185)
        at ProxyDemo.lambda$0(ProxyDemo.java:18)
        at $Proxy0.quack(Unknown Source)
        at ProxyDemo.main(ProxyDemo.java:28)

One of the Jigsaw project’s goals is to precisely not allow such hacks to persist. So, what’s a better solution? This?

import java.lang.invoke.MethodHandles;
import java.lang.invoke.MethodType;
import java.lang.reflect.Proxy;

interface Duck {
    default void quack() {
        System.out.println("Quack");
    }
}

public class ProxyDemo {
    public static void main(String[] a) {
        Duck duck = (Duck) Proxy.newProxyInstance(
            Thread.currentThread().getContextClassLoader(),
            new Class[] { Duck.class },
            (proxy, method, args) -> {
                MethodHandles.lookup()
                    .findSpecial( 
                         Duck.class, 
                         "quack",  
                         MethodType.methodType( 
                             void.class, 
                             new Class[0]),  
                         Duck.class)
                    .bindTo(proxy)
                    .invokeWithArguments();
                return null;
            }
        );

        duck.quack();
    }
}
Quack

Great, it works in Java 9 and 10, what about Java 8?

Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
	at $Proxy0.quack(Unknown Source)
	at ProxyDemo.main(ProxyDemo.java:25)
Caused by: java.lang.IllegalAccessException: no private access for invokespecial: interface Duck, from ProxyDemo
	at java.lang.invoke.MemberName.makeAccessException(MemberName.java:850)
	at java.lang.invoke.MethodHandles$Lookup.checkSpecialCaller(MethodHandles.java:1572)
	at java.lang.invoke.MethodHandles$Lookup.findSpecial(MethodHandles.java:1002)
	at ProxyDemo.lambda$0(ProxyDemo.java:18)
	... 2 more

You’re kidding, right?

So, there’s a solution (hack) that works on Java 8 but not on 9 or 10, and there’s a solution that works on Java 9 and 10, but not on Java 8.

A more thorough examination

So far, I’ve just been trying to run different things on different JDKs. The following class tries all combinations. It’s also available in this gist here.

Compile it with JDK 9 or 10 (because it also tries using JDK 9+ API: MethodHandles.privateLookupIn()), but compile it using this command, so you can also run the class on JDK 8:

javac -source 1.8 -target 1.8 CallDefaultMethodThroughReflection.java
import java.lang.invoke.MethodHandles;
import java.lang.invoke.MethodHandles.Lookup;
import java.lang.invoke.MethodType;
import java.lang.reflect.Constructor;
import java.lang.reflect.Method;
import java.lang.reflect.Proxy;


interface PrivateInaccessible {
    default void quack() {
        System.out.println(" -> PrivateInaccessible.quack()");
    }
}

public class CallDefaultMethodThroughReflection {
    interface PrivateAccessible {
        default void quack() {
            System.out.println(" -> PrivateAccessible.quack()");
        }
    }

    public static void main(String[] args) {
        System.out.println("PrivateAccessible");
        System.out.println("-----------------");
        System.out.println();
        proxy(PrivateAccessible.class).quack();

        System.out.println();
        System.out.println("PrivateInaccessible");
        System.out.println("-------------------");
        System.out.println();
        proxy(PrivateInaccessible.class).quack();
    }

    private static void quack(Lookup lookup, Class<?> type, Object proxy) {
        System.out.println("Lookup.in(type).unreflectSpecial(...)");

        try {
            lookup.in(type)
                  .unreflectSpecial(type.getMethod("quack"), type)
                  .bindTo(proxy)
                  .invokeWithArguments();
        }
        catch (Throwable e) {
            System.out.println(" -> " + e.getClass() + ": " + e.getMessage());
        }

        System.out.println("Lookup.findSpecial(...)");
        try {
            lookup.findSpecial(type, "quack", MethodType.methodType(void.class, new Class[0]), type)
                  .bindTo(proxy)
                  .invokeWithArguments();
        }
        catch (Throwable e) {
            System.out.println(" -> " + e.getClass() + ": " + e.getMessage());
        }
    }

    @SuppressWarnings("unchecked")
    private static <T> T proxy(Class<T> type) {
        return (T) Proxy.newProxyInstance(
            Thread.currentThread().getContextClassLoader(),
            new Class[] { type },
            (Object proxy, Method method, Object[] arguments) -> {
                System.out.println("MethodHandles.lookup()");
                quack(MethodHandles.lookup(), type, proxy);

                try {
                    System.out.println();
                    System.out.println("Lookup(Class)");
                    Constructor<Lookup> constructor = Lookup.class.getDeclaredConstructor(Class.class);
                    constructor.setAccessible(true);
                    constructor.newInstance(type);
                    quack(constructor.newInstance(type), type, proxy);
                }
                catch (Exception e) {
                    System.out.println(" -> " + e.getClass() + ": " + e.getMessage());
                }

                try {
                    System.out.println();
                    System.out.println("MethodHandles.privateLookupIn()");
                    quack(MethodHandles.privateLookupIn(type, MethodHandles.lookup()), type, proxy);
                }
                catch (Error e) {
                    System.out.println(" -> " + e.getClass() + ": " + e.getMessage());
                }

                return null;
            }
        );
    }
}

The output of the above program is:

Java 8

$ java -version
java version "1.8.0_141"
Java(TM) SE Runtime Environment (build 1.8.0_141-b15)
Java HotSpot(TM) 64-Bit Server VM (build 25.141-b15, mixed mode)

$ java CallDefaultMethodThroughReflection
PrivateAccessible
-----------------

MethodHandles.lookup()
Lookup.in(type).unreflectSpecial(...)
 -> PrivateAccessible.quack()
Lookup.findSpecial(...)
 -> class java.lang.IllegalAccessException: no private access for invokespecial: interface CallDefaultMethodThroughReflection$PrivateAccessible, from CallDefaultMethodThroughReflection

Lookup(Class)
Lookup.in(type).unreflectSpecial(...)
 -> PrivateAccessible.quack()
Lookup.findSpecial(...)
 -> PrivateAccessible.quack()

MethodHandles.privateLookupIn()
 -> class java.lang.NoSuchMethodError: java.lang.invoke.MethodHandles.privateLookupIn(Ljava/lang/Class;Ljava/lang/invoke/MethodHandles$Lookup;)Ljava/lang/invoke/MethodHandles$Lookup;

PrivateInaccessible
-------------------

MethodHandles.lookup()
Lookup.in(type).unreflectSpecial(...)
 -> class java.lang.IllegalAccessException: no private access for invokespecial: interface PrivateInaccessible, from PrivateInaccessible/package
Lookup.findSpecial(...)
 -> class java.lang.IllegalAccessException: no private access for invokespecial: interface PrivateInaccessible, from CallDefaultMethodThroughReflection

Lookup(Class)
Lookup.in(type).unreflectSpecial(...)
 -> PrivateInaccessible.quack()
Lookup.findSpecial(...)
 -> PrivateInaccessible.quack()

MethodHandles.privateLookupIn()
 -> class java.lang.NoSuchMethodError: java.lang.invoke.MethodHandles.privateLookupIn(Ljava/lang/Class;Ljava/lang/invoke/MethodHandles$Lookup;)Ljava/lang/invoke/MethodHandles$Lookup;

Java 9

$ java -version
java version "9.0.4"
Java(TM) SE Runtime Environment (build 9.0.4+11)
Java HotSpot(TM) 64-Bit Server VM (build 9.0.4+11, mixed mode)

$ java --illegal-access=deny CallDefaultMethodThroughReflection
PrivateAccessible
-----------------

MethodHandles.lookup()
Lookup.in(type).unreflectSpecial(...)
 -> PrivateAccessible.quack()
Lookup.findSpecial(...)
 -> PrivateAccessible.quack()

Lookup(Class)
 -> class java.lang.reflect.InaccessibleObjectException: Unable to make java.lang.invoke.MethodHandles$Lookup(java.lang.Class) accessible: module java.base does not "opens java.lang.invoke" to unnamed module @30c7da1e

MethodHandles.privateLookupIn()
Lookup.in(type).unreflectSpecial(...)
 -> PrivateAccessible.quack()
Lookup.findSpecial(...)
 -> PrivateAccessible.quack()

PrivateInaccessible
-------------------

MethodHandles.lookup()
Lookup.in(type).unreflectSpecial(...)
 -> class java.lang.IllegalAccessException: no private access for invokespecial: interface PrivateInaccessible, from PrivateInaccessible/package (unnamed module @30c7da1e)
Lookup.findSpecial(...)
 -> PrivateInaccessible.quack()

Lookup(Class)
 -> class java.lang.reflect.InaccessibleObjectException: Unable to make java.lang.invoke.MethodHandles$Lookup(java.lang.Class) accessible: module java.base does not "opens java.lang.invoke" to unnamed module @30c7da1e

MethodHandles.privateLookupIn()
Lookup.in(type).unreflectSpecial(...)
 -> PrivateInaccessible.quack()
Lookup.findSpecial(...)
 -> PrivateInaccessible.quack()

Java 10

$ java -version
java version "10" 2018-03-20
Java(TM) SE Runtime Environment 18.3 (build 10+46)
Java HotSpot(TM) 64-Bit Server VM 18.3 (build 10+46, mixed mode)

$ java --illegal-access=deny CallDefaultMethodThroughReflection
... same result as in Java 9

Conclusion

Getting this right is a bit tricky.

  • In Java 8, the best working approach is the hack that opens up the JDK’s internals by accessing a package-private Lookup constructor. This is the only way to consistently call default methods on both private-accessible and private-inaccessible interfaces from any location.
  • In Java 9 and 10, the best working approaches are Lookup.findSpecial() (didn’t work in Java 8) or the new MethodHandles.privateLookupIn() (didn’t exist in in Java 8). The latter is required in case the interfaced is located in another module. That module will still need to open the interface’s package to the caller.

It’s fair to say that this is a bit of a mess. The appropriate meme here is:

According to Rafael Winterhalter (author of ByteBuddy), the “real” fix should go into a revised Proxy API:

I’m not sure if that would solve all the problems, but it should definitely be the case that an implementor shouldn’t worry about all of the above.

Also, clearly, this article didn’t do the complete work, e.g. of testing whether the approaches still work if Duck is imported from another module:

… which will be a topic of another blog post.

Using jOOR

If you’re using jOOR (our reflection library, check it out here), the upcoming version 0.9.8 will include a fix for this:
https://github.com/jOOQ/jOOR/issues/49

The fix simply uses the unsafe reflection approach in Java 8, or the MethodHandles.privateLookupIn() approach in Java 9+. You can then write:

Reflect.on(new Object()).as(PrivateAccessible.class).quack();
Reflect.on(new Object()).as(PrivateInaccessible.class).quack();

Top 10 SQL Dialect Emulations Implemented in jOOQ

The SQL standard is a nice thing. But it’s a standard. And as such, while it may provide good guidelines on how to implement some SQL features, most dialects deviate from it in one way or another (sometimes drastically, cheers, MySQL).

But that doesn’t have to be a bad thing. Innovation is not driven by standards, but by individual vendors’ attempts to tackle a problem from a different perspective. And then, sometimes, the innovation becomes the standard. One example for this is Oracle’s very very interesting MATCH_RECOGNIZE feature, on which Markus Winand did an excellent presentation.

Other functionality is not standardised, such as Oracle/SQL Server’s PIVOT and UNPIVOT.

In many cases, vendor-specific functionality can be translated to equivalent standard SQL, or to other vendor-specific SQL. That’s one of jOOQ’s core features: The “standardisation” of currently 21 SQL dialects into a single Java API. Since jOOQ 3.9, the Java API can also be hidden behind a parser, which makes visualising the differences much simpler. If you want to play around with the following examples, do visit https://www.jooq.org/translate to see our online SQL translator in action!

Here are Top 10 SQL Dialect Emulations Implemented in jOOQ:

1. Empty FROM clause

The SQL standard doesn’t allow this, but many databases do. You have to specify a FROM clause in at least these databases

  • Access
  • CUBRID
  • DB2
  • Derby
  • Firebird
  • HANA
  • HSQLDB
  • Informix
  • Ingres
  • MariaDB
  • MySQL (not always)
  • Oracle
  • Sybase SQL Anywhere

These ones don’t really need a FROM clause:

  • H2
  • PostgreSQL
  • Redshift
  • SQL Server
  • SQLite
  • Sybase ASE
  • Vertica

An example of such a query could be the following query that retrieves the server timestamp:

SELECT current_timestamp

In Oracle, you’d have to add the DUAL table:

SELECT current_timestamp FROM dual

There are other possibilities to emulate this in other databases. If you want to see how jOOQ does it, again, either visit our online SQL translator, or run this little code snippet yourself, locally (be sure to report any issues you may find, greatly appreciated!):

import org.jooq.Query;
import org.jooq.SQLDialect;
import org.jooq.impl.DSL;

public class Translate {
    public static void main(String[] args) {
        Query query = DSL.using(SQLDialect.DEFAULT)
            .parser()
            .parseQuery("SELECT current_timestamp");

        for (SQLDialect family : SQLDialect.families()) {
            System.out.println(String.format(
                "%1$-13s: %2$s", family, 
                DSL.using(family).render(query)
            ));
    }
}

So, running the above snippet will yield (and observe, free current_timestamp translation:

ACCESS    : select now() from (select count(*) dual from MSysResources) as dual
ASE       : select current_bigdatetime()
CUBRID    : select current_timestamp() from "db_root"
DB2       : select current_timestamp from "SYSIBM"."DUAL"
DERBY     : select current_timestamp from "SYSIBM"."SYSDUMMY1"
FIREBIRD  : select current_timestamp from "RDB$DATABASE"
H2        : select current_timestamp()
HANA      : select current_timestamp from "SYS"."DUMMY"
HSQLDB    : select current_timestamp from (
              select 1 as dual from information_schema.system_users limit 1
            ) as dual
INFORMIX  : select current from (
              select 1 as dual from systables where tabid = 1
            ) as dual
INGRES    : select current_timestamp from (select 1 as dual) as dual
MARIADB   : select current_timestamp() from dual
MYSQL     : select current_timestamp() from dual
ORACLE    : select current_timestamp from dual
POSTGRES  : select current_timestamp
REDSHIFT  : select current_timestamp
SQLITE    : select current_timestamp
SQLSERVER : select current_timestamp
SYBASE    : select current timestamp from [SYS].[DUMMY]
VERTICA   : select current_timestamp

See also the jOOQ manual’s section about the dual table.

2. LIMIT .. OFFSET

First off, don’t use OFFSET.

Since you didn’t listen and you’re still using OFFSET, let’s discuss how to emulate it (and the much more useful LIMIT in many database dialects.

The SQL:2016 standard syntax is:

<result offset clause> ::=
  OFFSET <offset row count> { ROW | ROWS }

<fetch first clause> ::=
  FETCH { FIRST | NEXT } [ <fetch first quantity> ]
    { ROW | ROWS } { ONLY | WITH TIES }

<fetch first quantity> ::=
    <fetch first row count>
  | <fetch first percentage>

<offset row count> ::=
  <simple value specification>

<fetch first row count> ::=
  <simple value specification>

<fetch first percentage> ::=
  <simple value specification> PERCENT

So, there are a few interesting features:

  • The OFFSET (which is the least interesting)
  • The number of rows to FETCH
  • Whether tied rows should be fetched, too (TIES). This will be covered in the next section
  • Whether the number of rows is really a PERCENTage

Oracle currently is the only database (I’m aware of) that does it all and with standard syntax.

FETCH without OFFSET

For instance, when querying the Sakila database, we can get the TOP 3 longest films:

SELECT film_id, title, length
FROM film 
ORDER BY length DESC 
FETCH NEXT 3 ROWS ONLY

Yielding:

FILM_ID  TITLE           LENGTH
-------------------------------
212      DARN FORRESTER  185
182      CONTROL ANTHEM  185
141      CHICAGO NORTH   185

(In the next section we’ll look at the WITH TIES clause to find the other films of length 185)

But what do these queries look like in other databases? Here’s the translation of the ROWS ONLY query, according to jOOQ:

ACCESS    : select top 3 film_id, title, length from film order by length desc
ASE       : select top 3 film_id, title, length from film order by length desc
CUBRID    : select film_id, title, length from film 
              order by length desc limit 0, 3
DB2       : select film_id, title, length from film 
              order by length desc fetch first 3 rows only
DERBY     : select film_id, title, length from film 
              order by length desc offset 0 rows fetch next 3 rows only
FIREBIRD  : select film_id, title, length from film 
              order by length desc rows (0 + 1) to (0 + 3)
H2        : select film_id, title, length from film order by length desc limit 3
HANA      : select film_id, title, length from film order by length desc limit 3
HSQLDB    : select film_id, title, length from film order by length desc limit 3
INFORMIX  : select first 3 film_id, title, length from film order by length desc
INGRES    : select film_id, title, length from film 
              order by length desc offset 0 fetch first 3 rows only
MARIADB   : select film_id, title, length from film order by length desc limit 3
MYSQL     : select film_id, title, length from film order by length desc limit 3
ORACLE    : select film_id, title, length from film 
              order by length desc fetch next 3 rows only
POSTGRES  : select film_id, title, length from film order by length desc limit 3
REDSHIFT  : select film_id, title, length from film order by length desc limit 3
SQLITE    : select film_id, title, length from film order by length desc limit 3
SQLSERVER : select top 3 film_id, title, length from film order by length desc
SYBASE    : select top 3 film_id, title, length from film order by length desc
VERTICA   : select film_id, title, length from film order by length desc limit 3

So, there are essentially 3 families:

  • The standard family using FETCH, including DB2 (doesn’t support OFFSET), Derby, Ingres (although missing a keyword), Oracle
  • The MySQL family using LIMIT, including CUBRID, H2, HANA, HSQLDB, MariaDB, MySQL, PostgreSQL, Redshift, SQLite, Vertica
  • The T-SQL family using TOP, inculding Access, ASE, SQL Server, Sybase

There are also exotic syntaxes implemented by Firebird and Informix.

FETCH with OFFSET

You’ll find tons of blog posts on the web on how to emulate OFFSET .. LIMIT, including jOOQ’s manual. Things do get a bit more tricky when adding an offset, as can be seen here:

CUBRID    : select film_id, title, length from film
              order by length desc limit 3, 3
DB2       : select "v0" film_id, "v1" title, "v2" length from (
              select 
                film_id "v0", title "v1", length "v2", 
                row_number() over (order by length desc) "rn" 
              from film order by "v2" desc
            ) "x" 
            where "rn" > 3 and "rn" <= (3 + 3) 
            order by "rn"
DERBY     : select film_id, title, length from film 
              order by length desc offset 3 rows fetch next 3 rows only
FIREBIRD  : select film_id, title, length from film 
              order by length desc rows (3 + 1) to (3 + 3)
H2        : select film_id, title, length from film 
              order by length desc limit 3 offset 3
HANA      : select film_id, title, length from film 
              order by length desc limit 3 offset 3
HSQLDB    : select film_id, title, length from film 
              order by length desc limit 3 offset 3
INFORMIX  : select skip 3 first 3 film_id, title, length from film 
              order by length desc
INGRES    : select film_id, title, length from film 
              order by length desc offset 3 fetch first 3 rows only
MARIADB   : select film_id, title, length from film 
              order by length desc limit 3 offset 3
MYSQL     : select film_id, title, length from film 
              order by length desc limit 3 offset 3
ORACLE    : select film_id, title, length from film 
              order by length desc offset 3 rows fetch next 3 rows only
POSTGRES  : select film_id, title, length from film 
              order by length desc limit 3 offset 3
REDSHIFT  : select film_id, title, length from film 
              order by length desc limit 3 offset 3
SQLITE    : select film_id, title, length from film 
              order by length desc limit 3 offset 3
SQLSERVER : select film_id, title, length from film 
              order by length desc offset 3 rows fetch next 3 rows only
SYBASE    : select top 3 start at 4 film_id, title, length from film 
              order by length desc
VERTICA   : select film_id, title, length from film 
              order by length desc limit 3 offset 3

Interesting to note:

  • MS Access, and Sybase ASE do not support offsets at all (maybe a good thing).
  • The more recent versions of SQL Server support the SQL standard OFFSET .. FETCH clause (although OFFSET, unfortunately, is mandatory), which is great. Older versions can emulate OFFSET just like DB2 below
  • Sybase SQL Anywhere enhanced the T-SQL TOP syntax to something intuitive: TOP .. START AT. Why not?
  • DB2 doesn’t support the syntax, and we have to emulate it using ROW_NUMBER() window functions:
    select "v0" film_id, "v1" title, "v2" length from (
      select 
        film_id "v0", title "v1", length "v2", 
        row_number() over (order by length desc) "rn" 
      from film order by "v2" desc
    ) "x" 
    where "rn" > 3 and "rn" &lt;= (3 + 3) 
    order by "rn"
    

    Notice how, over the years, we’ve learned to do it right and prevent all sorts of side-effects from wrong emulations:

    • In the nested query, all columns have to be renamed to some enumerated column names to prevent problems from possibly duplicate column names in the user SQL query. It is perfectly OK for top-level SELECT statements to have duplicate / ambiguous column names, but not for subqueries
    • The top level SELECT statement should not project the auxiliary ROW_NUMBER() value. While this might not be causing trouble in ordinary queries, it is certainly causing trouble in subqueries. Imagine emulating something like:
      WHERE x IN (
        SELECT id
        FROM table
        OFFSET 1 ROW
        FETCH NEXT ROW ONLY
      )
      

      In this case, we must be very careful that the subquery continues to project only exactly one column.

3. WITH TIES

The previous approach to getting TOP 3 films is dangerous, because the ranking is rather random. There are other films of length 185, and they didn’t make it into the TOP 3. We could add another ordering criteria to make the ordering deterministic (e.g. FILM_ID), or we can use WITH TIES to get 3 films and all the tied films, too.

The query is now:

SELECT film_id, title, length
FROM film 
ORDER BY length DESC 
FETCH NEXT 3 ROWS WITH TIES

And we’re getting:

FILM_ID  TITLE               LENGTH
-----------------------------------
212      DARN FORRESTER	     185
872      SWEET BROTHERHOOD   185
817      SOLDIERS EVOLUTION  185
991      WORST BANGER        185
690      POND SEATTLE        185
609      MUSCLE BRIGHT       185
349      GANGS PRIDE         185
426      HOME PITY           185
182      CONTROL ANTHEM      185
141      CHICAGO NORTH       185

There are no more films of length 185 than the above. For more information about doing TOP N SQL queries, see this blog post.

For the sake of simplicity, let’s remove again the OFFSET clause (because it behaves inconsistently when combined with WITH TIES, at least in Oracle). Let’s look at WITH TIES only. jOOQ currently doesn’t emulate this clause for all dialects as it is hard to get right without window functions.

DB2       : select "v0" film_id, "v1" title, "v2" length from (
              select 
                film_id "v0", title "v1", length "v2", 
                rank() over (order by length desc) "rn"
              from film
            ) "x" 
            where "rn" > 0 and "rn" <= (0 + 3) 
            order by "rn"
HANA      : ... see DB2
MYSQL     : ... see DB2
ORACLE    : select film_id, title, length from film 
              order by length desc fetch next 3 rows with ties
POSTGRES  : ... see DB2
REDSHIFT  : ... see DB2
SQLSERVER : select top 3 with ties film_id, title, length from film 
              order by length desc
SYBASE    : ... see DB2

There are 3 ways to implement WITH TIES:

  • Oracle implements the SQL standard
  • SQL Server has a vendor-specific TOP N WITH TIES clause
  • All the others can emulate this feature using window functions

4. Nested set operations

Granted, this isn’t something you might run into every day, but when you need it, it’s really a PITA if your database doesn’t support it. Nested set operations. There are three set operations in SQL and relational algebra:

  • UNION
  • INTERSECT
  • EXCEPT (or MINUS, in Oracle)

All of the above come in two flavours:

  • OP or OP DISTINCT (standard syntax that isn’t implemented in any database)
  • OP ALL (most databases support this only for UNION)

Where ALL turns the set operation into a multiset operation, allowing duplicate results. ALL is fully supported (including on INTERSECT and EXCEPT) in:

  • CUBRID
  • DB2
  • Derby
  • HSQLDB
  • PostgreSQL

Now, the query. What if you want to find all actor names and all customer names, but you don’t want e.g. ADAM GRANT: In PostgreSQL, you could write:

SELECT first_name, last_name
FROM actor
UNION
SELECT first_name, last_name
FROM customer
EXCEPT
SELECT 'ADAM', 'GRANT'
ORDER BY 1, 2

In this case, we can simply hope that all these operators are left-associative, which means we’ll add customers to actors, and then remove ADAM GRANT. In fact, according to the standard, this is the case. But perhaps, not all databases implement things this way, and as soon as you mix in INTERSECT, things change, as INTERSECT has higher operator precedence.

Want to be sure? Put parentheses around the expressions, e.g.

(
  SELECT first_name, last_name
  FROM actor
  UNION
  SELECT first_name, last_name
  FROM customer
)
EXCEPT
SELECT 'ADAM', 'GRANT'
ORDER BY 1, 2

Still valid in PostgreSQL (and if you add FROM dual, and replace EXCEPT by MINUS, then also in Oracle), but won’t work e.g. in MySQL. How can we get this to work in all the databases?

Here’s how:

ASE       : ... like MySQL
CUBRID    : ... like PostgreSQL (plus, add the dual table)
DB2       : ... like PostgreSQL (plus, add the dual table)
DERBY     : select first_name, last_name from (
              select first_name, last_name from (
                select first_name, last_name from actor
              ) x 
              union 
              select first_name, last_name from (
                select first_name, last_name from customer
              ) x
            ) x 
            except 
            select "ADAM", "GRANT" from (
              select 'ADAM', 'GRANT' from "SYSIBM"."SYSDUMMY1"
            )
            x order by 1, 2
H2        : ... like PostgreSQL
HANA      : ... like PostgreSQL (plus, add the dual table)
HSQLDB    : ... like PostgreSQL (plus, add the dual table)
INFORMIX  : ... like PostgreSQL (plus, add the dual table)
INGRES    : ... like PostgreSQL (plus, add the dual table)
MARIADB   : ... like MySQL
MYSQL     : select * from (
              select * from (
                select first_name, last_name from actor
              ) x 
              union 
              select * from (
                select first_name, last_name from customer
              ) x
            ) x
            except 
            select * from (
              select 'ADAM', 'GRANT' from dual
            ) 
            x order by 1, 2
ORACLE    : ... like PostgreSQL (add dual and replace EXCEPT by MINUS)
POSTGRES  : (
              (select first_name, last_name from actor) 
                union 
              (select first_name, last_name from customer)
            ) 
            except (select 'ADAM', 'GRANT') 
            order by 1, 2
REDSHIFT  : 
SQLITE    : ... like MySQL
SQLSERVER : ... like PostgreSQL
SYBASE    : ... like PostgreSQL (plus, add the dual table)
VERTICA   : ... like PostgreSQL

Some observations:

  • Access doesn’t support EXCEPT
  • Firebird has a bit of trouble with these operators – I simply haven’t figured out how to work around them yet. Probably due to some parser bugs
  • PostgreSQL (and many others) get it right
  • MySQL (and a few others) get it wrong, and we have to wrap the various set operation subqueries in derived tables, when suddenly things work well again. This really seems to be just a parser problem, not actually missing functionality. But it’s really a pain if you have to rewrite your SQL manually to the MySQL version
  • Derby is like MySQL, but in addition to lacking parser support for standard SQL nested set operations, it also suffers from these nasty bugs: https://issues.apache.org/jira/browse/DERBY-6983 and https://issues.apache.org/jira/browse/DERBY-6984. Luckily, you have jOOQ to work around all these hassles for you!

5. Derived column lists

A really cool standard feature is called “derived column lists”. It allows for renaming a table AND its columns in one go, for instance in PostgreSQL:

SELECT a, b
FROM (
  SELECT first_name, last_name
  FROM actor
) t(a, b) -- Interesting feature here
WHERE a LIKE 'Z%'

Yielding

A     B
----------
ZERO  CAGE

The utility of this functionality is most important when:

  • You generate SQL dynamically, and perhaps you’re not entirely sure what your derived table’s column names are – just rename them and be sure again
  • You call a table-valued function, i.e. a function that returns a table, and again, you’re not really sure what it’s columns are
  • You simply don’t like the column names of a table. This might not be the most important use-case, as with the above syntax, you have to rename ALL (except in PostgreSQL) the columns, in the right order, and we don’t like to depend on such ordering

Again, not all databases support this feature. So, what to do if they don’t? Use this one weird trick with a UNION ALL subquery to emulate it!

ACCESS    : ... like PostgreSQL
ASE       : ... like PostgreSQL
CUBRID    : ... like PostgreSQL
DB2       : ... like PostgreSQL
DERBY     : ... like PostgreSQL
FIREBIRD  : ... like PostgreSQL
H2        : select a, b from (
              (select null a, null b where 1 = 0) 
               union all 
              (select first_name, last_name from actor)
            ) t 
            where a like 'Z%'
HANA      : ... like H2 (plus, add dual table)
HSQLDB    : ... like PostgreSQL
INFORMIX  : ... like PostgreSQL
INGRES    : ... like H2 (plus, add dual table)
MARIADB   : ... like H2 (plus, add dual table)
MYSQL     : ... like H2 (plus, add dual table)
ORACLE    : ... like H2 (plus, add dual table)
POSTGRES  : select a, b from (
              select first_name, last_name from actor
            ) as t(a, b) 
            where a like 'Z%'
REDSHIFT  : ... like PostgreSQL
SQLITE    : ... like H2
SQLSERVER : ... like PostgreSQL
SYBASE    : ... like PostgreSQL
VERTICA   : ... like PostgreSQL

Not a lot of magic here. Either the database supports the feature, or it doesn’t. If it’s not supported, then the derived table whose columns should be aliased must be prefixed by a zero-row-returning UNION ALL subquery, which defines the column names. Because if you use set operations, then the first subquery defines the column names. Cool, eh?

select a, b from (

  -- Dummy subquery defining the column names
  (select null a, null b where 1 = 0) 
   union all 

  -- Actually interesting subqeury
  (select first_name, last_name from actor)
) t 
where a like 'Z%'

Please, forgive me… Actually, it was all Bill Karwin’s idea.

6. VALUES clause

Did you know that VALUES() is a clause that can be used outside of INSERT statements? Yes. In PostgreSQL, you can just write:

VALUES ('Hello', 'World'), ('Cool', 'eh?')

And you’re getting the following result:

column1  column2
----------------
Hello    World  
Cool     eh?    

Of course, we should never rely on such generated column names, thus again, derived column lists. In PostgreSQL, this is only possible when actually using a derived table in this context:

SELECT *
FROM (
  VALUES ('Hello', 'World'), ('Cool', 'eh?') 
) AS t(a, b)

Do all the databases support this clause? Of course not! But at least, it can be emulated in all databases:

ACCESS    : ... like Oracle
ASE       : ... like PostgreSQL
CUBRID    : ... like PostgreSQL
DB2       : ... like PostgreSQL
DERBY     : ... like PostgreSQL
FIREBIRD  : ... like Sybase SQL Anywhere
H2        : select "v"."c1", "v"."c2" from (
              (select null "c1", null "c2" where 1 = 0) 
               union all 
              (select * from (
                 values ('Hello', 'World'), ('Cool', 'eh?')
               ) "v")
            ) "v"
HANA      : ... like Oracle
HSQLDB    : ... like PostgreSQL
INFORMIX  : ... like Sybase SQL Anywhere
INGRES    : ... like Oracle
MARIADB   : ... like Oracle
MYSQL     : ... like Oracle
ORACLE    : select "v"."c1", "v"."c2" from (
              (select null "c1", null "c2" from dual where 1 = 0) 
               union all 
              (select * from (
                (select 'Hello', 'World' from dual)
                 union all 
                (select 'Cool', 'eh?' from dual)
              ) "v")
            ) "v"
POSTGRES  : select "v"."c1", "v"."c2" from (
              values ('Hello', 'World'), ('Cool', 'eh?')
            ) as "v"("c1", "c2")
REDSHIFT  : ... like PostgreSQL
SQLITE    : ... like H2
SQLSERVER : ... like PostgreSQL
SYBASE    : select [v].[c1], [v].[c2] from (
              (select 'Hello', 'World' from [SYS].[DUMMY]) 
               union all 
              (select 'Cool', 'eh?' from [SYS].[DUMMY])
            ) [v]([c1], [c2])
VERTICA   : ... like PostgreSQL

There are 4 flavours of how this is supported:

  • PostgreSQL and others: Support both VALUES and derived column lists
  • H2 and others: Support only VALUES, not derived column lists
  • Sybase SQL Anywhere and others: Do not support VALUES, but derived column lists
  • Oracle and others: Support neither feature

Clearly, this is only syntactic sugar for other, more verbose SQL, but it’s really cool when you don’t actually need any real table. In fact the whole optional FROM clause discussion from the beginning of this article is unnecessary, once you have VALUES(), which would be the standard way to “avoid” the FROM clause.

7. Predicates using Row Value Expressions

Once you’ve started using these, you will not want to miss them. Row value expressions. They’re essentially just tuple expressions, like:

SELECT *
FROM customer
WHERE (first_name, last_name)
    = ('MARY', 'SMITH')

Or, according to the standard and to PostgreSQL, also:

SELECT *
FROM customer
WHERE ROW (first_name, last_name)
    = ROW ('MARY', 'SMITH')

The functionality doesn’t seem very useful when using equality predicates, but it is much more interesting when using IN predicates:

-- Any customer named the same way as an actor?
SELECT *
FROM customer
WHERE (first_name, last_name) IN (
  SELECT first_name, last_name
  FROM actor
)

Or, when doing keyset pagination, through non-equality predicates:

SELECT *
FROM customer
WHERE (first_name, last_name) 
    > ('JENNIFER', 'DAVIS')

Again, not all databases support these. And those that do, have various levels of support. PostgreSQL is again the only database that goes “all in” on all the predicates, including funky things like the DISTINCT predicate:

SELECT *
FROM customer
WHERE (first_name, last_name) 
  IS DISTINCT FROM ('JENNIFER', 'DAVIS')

But luckily, again, these things can be emulated. Let’s look at all 3 of the above examples, and save ourselves the DISTINCT predicate for the next list item:

Equality on row value expressions

This is trivial. Either it’s supported, or it isn’t. Or you’re Oracle, and require a special syntax, to prevent ORA-00920 invalid relational operator (I would really love to hear that story. Must be funky):

ACCESS    : ... like SQL Server
ASE       : ... like SQL Server
CUBRID    : ... like PostgreSQL
DB2       : ... like PostgreSQL
DERBY     : ... like SQL Server
FIREBIRD  : ... like SQL Server
H2        : ... like PostgreSQL
HANA      : ... like SQL Server
HSQLDB    : ... like PostgreSQL
INFORMIX  : select * from customer 
            where row (first_name, last_name) = row ('MARY', 'SMITH')
INGRES    : ... like SQL Server
MARIADB   : ... like PostgreSQL
MYSQL     : ... like PostgreSQL
ORACLE    : select * from customer 
            where (first_name, last_name) = (('MARY', 'SMITH'))
POSTGRES  : select * from customer 
            where (first_name, last_name) = ('MARY', 'SMITH')
REDSHIFT  : ... like PostgreSQL
SQLITE    : ... like SQL Server
SQLSERVER : select * from customer 
            where (first_name = 'MARY' and last_name = 'SMITH')
SYBASE    : ... like SQL Server
VERTICA   : ... like PostgreSQL

Note that Informix requires the ROW() constructor, which should be optional. And again, Oracle is… Oracle :-)

IN predicate

Emulating this is much more tricky if it is not supported. Remember that IN and EXISTS predicates can be equivalent, so there’s always a way to transform them into each other.

ACCESS    : ... like SQLite
ASE       : ... like SQL Server
CUBRID    : ... like SQL Server
DB2       : ... like SQL Server
DERBY     : ... like SQL Server
FIREBIRD  : ... like SQL Server
H2        : select * from customer where (first_name, last_name) in (
              select (first_name, last_name) from actor
            )
HANA      : ... like SQLite
HSQLDB    : ... like PostgreSQL
INFORMIX  : ... like SQL Server
INGRES    : ... like SQLite
MARIADB   : ... like PostgreSQL
MYSQL     : ... like PostgreSQL
ORACLE    : select * from customer where (first_name, last_name) in ((
              select first_name, last_name from actor
            ))
POSTGRES  : select * from customer where (first_name, last_name) in (
              select first_name, last_name from actor
            )
REDSHIFT  : ... like PostgreSQL
SQLITE    : select * from customer where exists (
              select x.c1, x.c2 from (
                (select null c1, null c2 where 1 = 0) 
                 union all 
                (select first_name, last_name from actor)
              ) x 
              where (first_name = x.c1 and last_name = x.c2)
            )
SQLSERVER : select * from customer where exists (
              select x.c1, x.c2 
              from (select first_name, last_name from actor) x(c1, c2) 
              where (first_name = x.c1 and last_name = x.c2)
            )
SYBASE    : ... like SQL Server
VERTICA   : ... like SQL Server

Observations:

  • At this point, it’s worth mentioning that these things work “by accident” in H2. H2 unfortunately decided to use the (a, b, …, n) syntax for arrays, which are similar to tuples, but not the same thing. You can see in the H2 syntax that we have to wrap the two columns of the subquery in parentheses as well for the IN predicate to work as expected.
  • The transformation to an EXISTS() predicate requires the derived column list feature again. This is why some emulations are even more verbose than others.

Non-equality predicate

This predicate can be expanded to its standard definition again, easily, if it is not natively supported:

ACCESS    : ... like Oracle
ASE       : ... like PostgreSQL
CUBRID    : ... like Oracle
DB2       : ... like PostgreSQL
DERBY     : ... like Oracle
FIREBIRD  : ... like Oracle
H2        : ... like PostgreSQL
HANA      : ... like Oracle
HSQLDB    : ... like PostgreSQL
INFORMIX  : ... like Oracle
INGRES    : ... like Oracle
MARIADB   : ... like PostgreSQL
MYSQL     : ... like PostgreSQL
ORACLE    : select * from customer where (
              first_name >= 'JENNIFER' and (
                first_name > 'JENNIFER' or (
                  first_name = 'JENNIFER' and last_name > 'DAVIS'
                )
              )
            )
POSTGRES  : select * from customer 
              where (first_name, last_name) > ('JENNIFER', 'DAVIS')
REDSHIFT  : ... like Oracle
SQLITE    : ... like Oracle
SQLSERVER : ... like Oracle
SYBASE    : ... like Oracle
VERTICA   : ... like PostgreSQL

Observation:

  • Strictly speaking, it is not necessary to have one of the two predicates redundant in the emulation, but unfortunately, many databases have trouble when the top boolean operator of a boolean expression is OR rather than AND

8. The DISTINCT predicate

In the previous section, we’ve briefly mentioned the DISTINCT predicate, a useful predicate that helps handling NULL values as we’s mostly expect.

A quick summary in PostgreSQL:

WITH t(v) AS (
  VALUES (1),(2),(null)
)
SELECT v1, v2, v1 IS DISTINCT FROM v2
FROM t t1(v1), t t2(v2)

This yields:

v1  v2  d
-----------------
1   1   false    
1   2   true     
1       true     
2   1   true     
2   2   false    
2       true     
    1   true     
    2   true     
        false    

Conveniently, this never returns NULL when comparing anything with NULL, so simply spoken NULL IS NOT DISTINCT FROM NULL is TRUE. Quite some syntax, but hey, it’s SQL.

Regrettably, only few databases support the standard syntax, and MySQL and SQLite have a much more concise, non-standard operator. Let’s emulate the query from the previous section in our databases:

SELECT *
FROM customer
WHERE (first_name, last_name) 
  IS DISTINCT FROM ('JENNIFER', 'DAVIS')

Observe, there’s a really cool way to emulate this operation using INTERSECT, because interestingly, set operations also treat two NULL values as “the same”, i.e. non-DISTINCT. We have:

ACCESS    : ... like SQL Server (plus, add the dual table)
ASE       : ... like SQL Server (plus, add the dual table)
CUBRID    : ... like SQL Server (plus, add the dual table)
DB2       : ... like SQL Server (plus, add the dual table)
DERBY     : ... like SQL Server (plus, add the dual table)
FIREBIRD  : ... like PostgreSQL
H2        : ... like PostgreSQL
HANA      : ... like SQL Server (plus, add the dual table)
HSQLDB    : ... like PostgreSQL
INFORMIX  : ... like SQL Server (plus, add the dual table)
INGRES    : ... like SQL Server (plus, add the dual table)
MARIADB   : ... like MySQL
MYSQL     : select * from customer where (not((first_name, last_name) 
                                         <=> ('JENNIFER', 'DAVIS')))
ORACLE    : ... like SQL Server (plus, add the dual table)
POSTGRES  : select * from customer where (first_name, last_name) 
                        is distinct from ('JENNIFER', 'DAVIS')
REDSHIFT  : ... like PostgreSQL
SQLITE    : select * from customer where ((first_name, last_name) 
                                   is not ('JENNIFER', 'DAVIS'))
SQLSERVER : select * from customer where not exists (
              (select first_name, last_name) 
               intersect 
              (select 'JENNIFER', 'DAVIS')
            )
SYBASE    : ... like SQL Server (plus, add the dual table)
VERTICA   : ... like SQL Server

Want to try it yourself? The original PostgreSQL truth-table producing query can be transformed to this one:

WITH t(v) AS (
  VALUES (1),(2),(null)
)
SELECT v1, v2, NOT EXISTS (
  SELECT v1 INTERSECT SELECT v2
)
FROM t t1(v1), t t2(v2)

It produces the same truth table. Cool, eh?

9. DDL statements

This is one of the main reasons why we’re doing all of this. We want to allow for SQL text based migration scripts (e.g. run with Flyway) to be translatable to any kind of SQL dialect. Because DDL is really the most boring part of SQL to keep vendor-agnostic.

Just two short examples:

Copying a table structure into a new table

A quick and dirty way to copy a table structure is this:

CREATE TABLE x AS 
SELECT 1 AS one
WITH NO DATA

Looks cool, hm? Unfortunately, there is some trouble with the syntax as you will see in the emulations:

DB2       : create table x as (select 1 one from "SYSIBM"."DUAL") 
            with no data
H2        : ... like Oracle
MARIADB   : ... like Oracle
MYSQL     : ... like Oracle
ORACLE    : create table x as select 1 one from dual where 1 = 0
POSTGRES  : create table x as select 1 as one with no data
SQLSERVER : select 1 one into x where 1 = 0

I’ve left out a couple of dialects, as this hasn’t been integration tested everywhere yet, being work in progress. There are 4 flavours:

  • PostgreSQL: Actual support for the WITH [ NO ] DATA clause
  • DB2: Actual support for the WITH [ NO ] DATA clause (but funky requirement to wrap the source query in parentheses
  • Oracle: No support for the clause (easy to emulate with dummy predicate), but at least support for CTAS (CREATE TABLE AS SELECT)
  • SQL Server: Vendor specific alternative to CTAS

The inverse is equally fun to emulate, let’s actually add the data:

CREATE TABLE x AS 
SELECT 1 AS one
WITH DATA

And we’re getting:

DB2       : begin 
              execute immediate '
                create table x as (select 1 one from "SYSIBM"."DUAL") 
                with no data 
              '; 
              execute immediate '
                insert into x select 1 one from "SYSIBM"."DUAL" 
              '; 
            end
ORACLE    : create table x as select 1 one from dual
POSTGRES  : create table x as select 1 as one with data
SQLSERVER : select 1 one into x

Let’s focus on the interesting bits only.

  • Oracle, PostgreSQL, SQL Server as before
  • DB2 actually cannot copy the data with the table structure. Huh!

As can be seen above, in cases like this, we might need to split a single DDL statement in a statement batch or anonymous block containing several statements. This is work in progress as not all databases support anonymous blocks or statement batches.

There are many other interesting DDL emulations, and a lot of it is not yet implemented. Just play around with them on https://www.jooq.org/translate and send us your feature requests / ideas to https://github.com/jOOQ/jOOQ/issues/new

10. Built-in Functions

Last but not least, there are a ton of built-in functions, such as the boring LPAD() function. (Left pad is known for various things). Migrating these between databases is really really tedious. We’re here to help! Let’s emulate:

SELECT lpad('abc', ' ', 5)

Translation:

ACCESS    : replace(space(' ' - len('abc')), ' ', 5) & 'abc'
ASE       : (replicate(5, (' ' - char_length('abc'))) || 'abc')
CUBRID    : lpad('abc', ' ', 5)
DB2       : lpad('abc', ' ', 5)
DERBY     : lpad('abc', ' ', 5)
FIREBIRD  : lpad('abc', ' ', 5)
H2        : lpad('abc', ' ', 5)
HANA      : lpad('abc', ' ', 5)
HSQLDB    : lpad('abc', ' ', 5)
INFORMIX  : lpad('abc', ' ', 5)
INGRES    : lpad('abc', ' ', 5)
MARIADB   : lpad('abc', ' ', 5)
MYSQL     : lpad('abc', ' ', 5)
ORACLE    : lpad('abc', ' ', 5)
POSTGRES  : lpad('abc', ' ', 5)
REDSHIFT  : lpad('abc', ' ', 5)
SQLITE    : substr(replace(replace(substr(quote(zeroblob(((' ' - length('abc') - 1 + length("5")) / length("5") + 1) / 2)), 3), '''', ''), '0', "5"), 1, (' ' - length('abc'))) || 'abc'
SQLSERVER : (replicate(5, (' ' - len('abc'))) + 'abc')
SYBASE    : (repeat(5, (' ' - length('abc'))) || 'abc')
VERTICA   : lpad('abc', ' ', 5)

Forgive me again for the SQLite version. It was a suggestion made by an unknown user on Stack Overflow, the place where I tend to nerd-snipe people into solving such problems for me for free.

Conclusion

jOOQ standardises SQL into a type safe, embedded internal DSL in Java. With jOOQ 3.9+, we’ve added a parser (which is also publicly available on https://www.jooq.org/translate), which removes the need to express everything in the jOOQ API. Just parse your random SQL string and translate it to some other SQL dialect. This list could easily be extended to 50 items and more, but it is much more fun to play around with our website and try this on your own.

Please, if you do, do report any issue, feature request that you’d like to see at: https://github.com/jOOQ/jOOQ/issues/new to help us make this new tool even more valuable for you. In the near future, we’re going to more closely integrate this parser with other tools, such as Flyway, as we think there’s a lot of value in vendor-agnostic, standardised SQL.