How to Write a Simple, yet Extensible API

How to write a simple API is already an art on its own.

I didn’t have time to write a short letter, so I wrote a long one instead.

― Mark Twain

But keeping an API simple for beginners and most users, and making it extensible for power users seems even more of a challenge. But is it?

What does “extensible” mean?

Imagine an API like, oh say, jOOQ. In jOOQ, you can write SQL predicates like this:

ctx.select(T.A, T.B)
   .from(T)
   .where(T.C.eq(1)) // Predicate with bind value here
   .fetch();

By default (as this should always be the default), jOOQ will generate and execute this SQL statement on your JDBC driver, using a bind variable:

SELECT t.a, t.b
FROM t
WHERE t.c = ?

The API made the most common use case simple. Just pass your bind variable as if the statement was written in e.g. PL/SQL, and let the language / API do the rest. So we passed that test.

The use case for power users is to occasionally not use bind variables, for whatever reasons (e.g. skew in data and bad statistics, see also this post about bind variables).Will we pass that test as well?

jOOQ mainly offers two ways to fix this:

On a per-query basis

You can turn your variable into an inline value explicitly for this single occasion:

ctx.select(T.A, T.B)
   .from(T)
   .where(T.C.eq(inline(1))) // Predicate without bind value here
   .fetch();

This is using the static imported DSL.inline() method. Works, but not very convenient, if you have to do this for several queries, for several bind values, or worse, depending on some context.

This is a necessary API enhancement, but it does not make the API extensible.

On a global basis

Notice that ctx object there? It is the DSLContext object, the “contextual DSL”, i.e. the DSL API that is in the context of a jOOQ Configuration. You can thus set:

ctx2 = DSL.using(ctx
    .configuration()
    .derive()
    .set(new Settings()
    .withStatementType(StatementType.STATIC_STATEMENT));

// And now use this new DSLContext instead of the old one
ctx2.select(T.A, T.B)
    .from(T)
    .where(T.C.eq(1)) // No longer a bind variable
    .fetch();

Different approaches to offering such extensibility

We have our clean and simple API. Now some user wants to extend it. So often, we’re tempted to resort to a hack, e.g. by using thread locals, because they would work easily when under the assumption of a thread-bound execution model – such as e.g. classic Java EE Servlets

The price we’re paying for such a hack is high.

  1. It’s a hack, and as such it will break easily. If we offer this as functionality to a user, they will start depending on it, and we will have to support and maintain it
  2. It’s a hack, and it is based on assumptions, such as thread bound ness. It will not work in an async / reactive / parallel stream context, where our logic may jump back and forth between threads
  3. It’s a hack, and deep inside, we know it’s wrong. Obligatory XKCD: https://xkcd.com/292

This might obviously work, just like global (static) variables. You can set this variable globally (or “globally” for your own thread), and then the API’s internals will be able to read it. No need to pass around parameters, so no need to compromise on the APIs simplicity by adding optional and often ugly, distractive parameters.

What are better approaches to offering such extensibility?

Dependency Injection

One way is to use explicit Dependency Injection (DI). If you have a container like Spring, you can rely on Spring injecting arbitrary objects into your method call / whatever, where you need access to it:

This way, if you maintain several contextual objects of different lifecycle scopes, you can let the DI framework make appropriate decisions to figure out where to get that contextual information from. For example, when using JAX-RS, you can do this using an annotation based approach:


// These annotations bind the method to some HTTP address
@GET
@Produces("text/plain")
@Path("/api")
public String method(

    // This annotation fetches a request-scoped object
    // from the method call's context
    @Context HttpServletRequest request,

    // This annotation produces an argument from the
    // URL's query parameters
    @QueryParam("arg") String arg
) {
    ...
}

This approach works quite nicely for static environments (annotations being static), where you do not want to react to dynamic URLs or endpoints. It is declarative, and a bit magic, but well designed, so once you know all the options, you can choose the right one for your use case very easily.

While @QueryParam is mere convenience (you could have gotten the argument also from the HttpServletRequest), the @Context is powerful. It can help inject values of arbitrary lifecycle scope into your method / class / etc.

I personally favour explicit programming over annotation-based magic (e.g. using Guice for DI), but that’s probably a matter of taste. Both are a great way for implementors of APIs (e.g. HTTP APIs) to help get access to framework objects.

However, if you’re an API vendor, and want to give users of your API a way to extend the API, I personally favour jOOQ’s SPI approach.

SPIs

One of jOOQ’s strengths, IMO, is precisely this single, central place to register all SPI implementations that can be used for all sorts of purposes: The Configuration.

For example, on such a Configuration you can specify a JSR-310 java.time.Clock. This clock will be used by jOOQ’s internals to produce client side timestamps, instead of e.g. using System.currentTimeMillis(). Definitely a use case for power users only, but once you have this use case, you really only want to tweak a single place in jOOQ’s API: The Configuration.

All of jOOQ’s internals will always have a Configuration reference available. And it’s up to the user to decide what the scope of this object is, jOOQ doesn’t care. E.g.

  • per query
  • per thread
  • per request
  • per session
  • per application

In other words, to jOOQ, it doesn’t matter at all if you’re implementing a thread-bound, blocking, classic servlet model, or if you’re running your code reactively, or in parallel, or whatever. Just manage your own Configuration lifecycle, jOOQ doesn’t care.

In fact, you can have a global, singleton Configuration and implement thread bound components of it, e.g. the ConnectionProvider SPI, which takes care of managing the JDBC Connection lifecycle for jOOQ. Typically, users will use e.g. a Spring DataSource, which manages JDBC Connection (and transactions) using a thread-bound model, internally using ThreadLocal. jOOQ does not care. The SPI specifies that jOOQ will:

Again, it does not matter to jOOQ what the specific ConnectionProvider implementation does. You can implement it in any way you want if you’re a power user. By default, you’ll just pass jOOQ a DataSource, and it will wrap it in a default implementation called DataSourceConnectionProvider for you.

The key here is again:

  • The API is simple by default, i.e. by default, you don’t have to know about this functionality, just pass jOOQ a DataSource as always when working with Java and SQL, and you’re ready to go
  • The SPI allows for easily extending the API without compromising on its simplicity, by providing a single, central access point to this kind of functionality

Other SPIs in Configuration include:

  • ExecuteListener: An extremely useful and simple way to hook into the entire jOOQ query management lifecycle, from generating the SQL string to preparing the JDBC statement, to binding variables, to execution, to fetching result sets. A single SPI can accomodate various use cases like SQL logging, patching SQL strings, patching JDBC statements, listening to result set events, etc.
  • ExecutorProvider: Whenever jOOQ runs something asynchronously, it will ask this SPI to provide a standard JDK Executor, which will be used to run the asynchronous code block. By default, this will be the JDK default (the default ForkJoinPool), as always. But you probably want to override this default, and you want to be in full control of this, and not think about it every single time you run a query.
  • MetaProvider: Whenever jOOQ needs to look up database meta information (schemas, tables, columns, types, etc.), it will ask this MetaProvider about the available meta information. By default, this will run queries on the JDBC DatabaseMetaData, which is good enough, but maybe you want to wire these calls to your jOOQ-generated classes, or something else.
  • RecordMapperProvider and RecordUnmapperProvider: jOOQ has a quite versatile default implementation of how to map between a jOOQ Record and an arbitrary Java class, supporting a variety of standard approaches including JavaBeans getter/setter naming conventions, JavaBeans @ConstructorProperties, and much more. These defaults apply e.g. when writing query.fetchInto(MyBean.class). But sometimes, the defaults are not good enough, and you want this particular mapping to work differently. Sure, you could write query.fetchInto(record -> mymapper(record)), but you may not want to remember this for every single query. Just override the mapper (and unmapper) at a single, central spot for your own chosen Configuration scope (e.g. per query, per request, per session, etc.) and you’re done

Conclusion

Writing a simple API is difficult. Making it extensible in a simple way, however, is not. If your API has achieved “simplicity”, then it is very easy to support injecting arbitrary SPIs for arbitrary purposes at a single, central location, such as jOOQ’s Configuration.

In my most recent talk “10 Reasons Why we Love Some APIs and Why we Hate Some Others”, I’ve made a point that things like simplicity, discoverability, consistency, and convenience are among the most important aspects of a great API. How do you define a good API? The most underrated answer on this (obviously closed) Stack Overflow question is this one:

.

Again, this is hard in terms of creating a simple API. But it is extremely easy when making this simple API extensible. Make your SPIs very easily discoverable. A jOOQ power user will always look for extension points in jOOQ’s Configuration. And because the extension points are explicit types which have to be implemented (as opposed to annotations and their magic), no documentation is needed to learn the SPI (of course it is still beneficial as a reference).

I’d love to hear your alternative approaches to this API design challenge in the comments.

Watch the full talk here:

How to Unit Test Your Annotation Processor using jOOR

Annotation processors can be useful as a hacky workaround to get some language feature into the Java language.

jOOQ also has an annotation processor that helps validate SQL syntax for:

  • Plain SQL usage (SQL injection risk)
  • SQL dialect support (prevent using an Oracle only feature on MySQL)

You can read about it more in detail here.

Unit testing annotation processors

Unit testing annotation processors is a bit more tricky than using them. Your processor hooks into the Java compiler and manipulates the compiled AST (or does other things). If you want to test your own processor, you need the test to run a Java compiler, but that is difficult to do in a normal project setup, especially if the expected behaviour for a given test is a compilation error.

Let’s assume we have the following two annotations:

@interface A {}
@interface B {}

And now, we would like to establish a rule that @A must always be accompanied by @B. For example:

// This must not compile
@A
class Bad {}

// This is fine
@A @B
class Good {}

We’ll enforce that with an annotation processor:

class AProcessor implements Processor {
    boolean processed;
    private ProcessingEnvironment processingEnv;

    @Override
    public Set<String> getSupportedOptions() {
        return Collections.emptySet();
    }

    @Override
    public Set<String> getSupportedAnnotationTypes() {
        return Collections.singleton("*");
    }

    @Override
    public SourceVersion getSupportedSourceVersion() {
        return SourceVersion.RELEASE_8;
    }

    @Override
    public void init(ProcessingEnvironment processingEnv) {
        this.processingEnv = processingEnv;
    }

    @Override
    public boolean process(Set<? extends TypeElement> annotations, RoundEnvironment roundEnv) {
        for (TypeElement e1 : annotations)
            if (e1.getQualifiedName().contentEquals(A.class.getName()))
                for (Element e2 : roundEnv.getElementsAnnotatedWith(e1))
                    if (e2.getAnnotation(B.class) == null)
                        processingEnv.getMessager().printMessage(ERROR, "Annotation A must be accompanied by annotation B");

        this.processed = true;
        return false;
    }

    @Override
    public Iterable<? extends Completion> getCompletions(Element element, AnnotationMirror annotation, ExecutableElement member, String userText) {
        return Collections.emptyList();
    }
}

Now, this works. We can easily verify that manually by adding the annotation processor to some Maven compiler configuration and by annotating a few classes with A and B. But then, someone changes the code and we don’t notice the regression. How can we unit test this, rather than doing things manually?

jOOR 0.9.10 support for annotation processors

jOOR is our little open source reflection library that we’re using internally in jOOQ

jOOR has a convenient API to invoke the javax.tools.JavaCompiler API through Reflect.compile(). The most recent release 0.9.10 now takes an optional CompileOptions argument where annotation processors can be registered.

This means, we can now write a very simple unit test as follows (and if you’re using Java 12, you can profit from raw string literals! For a Java 11 compatible version without raw string literals, see our unit tests on github):

@Test
public void testCompileWithAnnotationProcessors() {
    AProcessor p = new AProcessor();

    try {
        Reflect.compile(
            "org.joor.test.FailAnnotationProcessing",
            ```
             package org.joor.test; 
             @A 
             public class FailAnnotationProcessing {
             }
            ```,
            new CompileOptions().processors(p)
        ).create().get();
        Assert.fail();
    }
    catch (ReflectException expected) {
        assertTrue(p.processed);
    }

    Reflect.compile(
        "org.joor.test.SucceedAnnotationProcessing",
        ```
         package org.joor.test; 
         @A @B 
         public class SucceedAnnotationProcessing {
         }
        ```,
        new CompileOptions().processors(p)
    ).create().get();
    assertTrue(p.processed);
}

So easy! Never have regressions in your annotation processors again!

How to Create a Good MCVE (Minimal Complete Verifiable Example)

Reporting a bug takes time, and trust me, every vendor appreciates your reporting of a bug! Your voice counts as many voices, for all the other customers of a product who do not want to or cannot take the time to report the same bug are numerous.

So, first off, thanks for taking that time and reaching out to us vendors. We really appreciate your help!

Having said so, reporting a bug can be a tedious exercise. For both parties, the one reporting the bug and the one receiving it. There are extremely simple bugs, such as typos in documentation. They can be easily pointed to and just as easily be fixed. There are much trickier bugs, such as concurrency issues in complicated project setups. They take time to reproduce. This is why an MCVE (Minimal Complete Verifiable Example) is so useful. The linked stack overflow page explains why it is so useful to answer questions. But the same arguments apply when reporting a bug.

And that’s where the tricky part starts. It isn’t easy to create an example that is:

  • Minimal: Your real world application code is huge. You cannot dump the entirety of it to the vendor for various reasons. And the vendor cannot look through it all to try to reproduce it. So, the problem has to be isolated into an example of minimal scope, with no unnecessary additional functionality. That’s hard too, because your project has been set up months or years ago. You don’t want to spend too much time setting up a new project
  • Complete: When reducing the problem to a minimal one, we’re tempted to just describe it in prose. But that can be difficult as well, because prose is hardly complete. It’s difficult to describe a problem when it would be quite easy to show the code. But that brings us back to the minimal part. We want to show only the relevant code, not all of it.
  • Verifiable: Ultimately, the ideal example can be used by the vendor to reproduce the problem, because once that’s possible, the vendor can start debugging it and finding the right spots to fix quite easily. Otherwise, it’s just guessing and going back and forth with the reporter, just to write more prose. That’s tiring on both sides.

This is why we now have an example project on GitHub to help you create that MCVE:

https://github.com/jOOQ/jOOQ-mcve

It is a minimal example that uses:

This example can be forked on GitHub and modified by you directly, in order to show how to reproduce your issue. In the future, we’ll add more example setups that may be helpful to reproduce your specific issue.

Thanks again for taking the time to report issues. We vendors really appreciate your work!

Imperative Loop or Functional Stream Pipeline? Beware of the Performance Impact!

I like weird, yet concise language constructs and API usages

Yes. I am guilty. Evil? Don’t know. But guilty. I heavily use and abuse the java.lang.Boolean type to implement three valued logic in Java:

  • Boolean.TRUE means true (duh)
  • Boolean.FALSE means false
  • null can mean anything like “unknown” or “uninitialised”, etc.

I know – a lot of enterprise developers will bikeshed and cargo cult the old saying:

Code is read more often than it is written

But as with everything, there is a tradeoff. For instance, in algorithm-heavy, micro optimised library code, it is usually more important to have code that really performs well, rather than code that apparently doesn’t need comments because the author has written it in such a clear and beautiful way.

I don’t think it matters much in the case of the boolean type (where I’m just too lazy to encode every three valued situation in an enum). But here’s a more interesting example from that same twitter thread. The code is simple:

woot:
if (something) {
  for (Object o : list) 
    if (something(o))
      break woot;

  throw new E();
}

Yes. You can break out of “labeled ifs”. Because in Java, any statement can be labeled, and if the statement is a compound statement (observe the curly braces following the if), then it may make sense to break out of it. Even if you’ve never seen that idiom, I think it’s quite immediately clear what it does.

Ghasp!

If Java were a bit more classic, it might have supported this syntax:

if (something) {
  for (Object o : list) 
    if (something(o))
      goto woot;

  throw new E();
}
woot:

Nicolai suggested that the main reason I hadn’t written the following, equivalent, and arguably more elegant logic, is because jOOQ still supports Java 6:

if (something && list.stream().noneMatch(this::something))
  throw new E();

It’s more concise! So, it’s better, right? Everything new is always better.

A third option would have been the less concise solution that essentially just replaces break by return:

if (something && noneMatchSomething(list)
  throw new E();

// And then:
private boolean noneMatchSomething(List<?> list) {
  for (Object o : list)
    if (something(o))
      return false;
  return true;
}

There’s an otherwise useless method that has been extracted. The main benefit is that people are not used to breaking out of labeled statements (other than loops, and even then it’s rare), so this is again about some subjective “readability”. I personally find this particular example less readable, because the extracted method is no longer local. I have to jump around in the class and interrupt my train of thoughts. But of course, YMMV with respect to the two imperative alternatives.

Back to objectivity: Performance

When I tweet about Java these days, I’m mostly tweeting about my experience writing jOOQ. A library. A library that has been tuned so much over the past years, that the big client side bottleneck (apart from the obvious database call) is the internal StringBuilder that is used to generate dynamic SQL. And compared to most database queries, you will not even notice that.

But sometimes you do. E.g. if you’re using an in-memory H2 database and run some rather trivial queries, then jOOQ’s overhead can become measurable again. Yes. There are some use-cases, which I do want to take seriously as well, where the difference between an imperative loop and a stream pipeline is measurable.

In the above examples, let’s remove the throw statement and replace it by something simpler (because exceptions have their own significant overhead).

I’ve created this JMH benchmark, which compares the 3 approaches:

  • Imperative with break
  • Imperative with return
  • Stream

Here’s the benchmark

package org.jooq.test.benchmark;

import java.util.ArrayList;
import java.util.List;

import org.openjdk.jmh.annotations.*;

@Fork(value = 3, jvmArgsAppend = "-Djmh.stack.lines=3")
@Warmup(iterations = 5, time = 3)
@Measurement(iterations = 7, time = 3)
public class ImperativeVsStream {

    @State(Scope.Benchmark)
    public static class BenchmarkState {

        boolean something = true;

        @Param({ "2", "8" })
        int listSize;

        List<Integer> list = new ArrayList<>();

        boolean something() {
            return something;
        }

        boolean something(Integer o) {
            return o > 2;
        }

        @Setup(Level.Trial)
        public void setup() throws Exception {
            for (int i = 0; i < listSize; i++)
                list.add(i);
        }

        @TearDown(Level.Trial)
        public void teardown() throws Exception {
            list = null;
        }
    }

    @Benchmark
    public Object testImperativeWithBreak(BenchmarkState state) {
        woot:
        if (state.something()) {
            for (Integer o : state.list)
                if (state.something(o))
                    break woot;

            return 1;
        }

        return 0;
    }

    @Benchmark
    public Object testImperativeWithReturn(BenchmarkState state) {
        if (state.something() && woot(state))
            return 1;

        return 0;
    }

    private boolean woot(BenchmarkState state) {
        for (Integer o : state.list)
            if (state.something(o))
                return false;

        return true;
    }

    @Benchmark
    public Object testStreamNoneMatch(BenchmarkState state) {
        if (state.something() && state.list.stream().noneMatch(state::something))
            return 1;

        return 0;
    }

    @Benchmark
    public Object testStreamAnyMatch(BenchmarkState state) {
        if (state.something() && !state.list.stream().anyMatch(state::something))
            return 1;

        return 0;
    }

    @Benchmark
    public Object testStreamAllMatch(BenchmarkState state) {
        if (state.something() && state.list.stream().allMatch(s -> !state.something(s)))
            return 1;

        return 0;
    }
}

The results are pretty clear:

Benchmark                                    (listSize)   Mode  Cnt         Score          Error  Units
ImperativeVsStream.testImperativeWithBreak            2  thrpt   14  86513288.062 ± 11950020.875  ops/s
ImperativeVsStream.testImperativeWithBreak            8  thrpt   14  74147172.906 ± 10089521.354  ops/s
ImperativeVsStream.testImperativeWithReturn           2  thrpt   14  97740974.281 ± 14593214.683  ops/s
ImperativeVsStream.testImperativeWithReturn           8  thrpt   14  81457864.875 ±  7376337.062  ops/s
ImperativeVsStream.testStreamAllMatch                 2  thrpt   14  14924513.929 ±  5446744.593  ops/s
ImperativeVsStream.testStreamAllMatch                 8  thrpt   14  12325486.891 ±  1365682.871  ops/s
ImperativeVsStream.testStreamAnyMatch                 2  thrpt   14  15729363.399 ±  2295020.470  ops/s
ImperativeVsStream.testStreamAnyMatch                 8  thrpt   14  13696297.091 ±   829121.255  ops/s
ImperativeVsStream.testStreamNoneMatch                2  thrpt   14  18991796.562 ±   147748.129  ops/s
ImperativeVsStream.testStreamNoneMatch                8  thrpt   14  15131005.381 ±   389830.419  ops/s

With this simple example, break or return don’t matter. At some point, adding additional methods might start getting in the way of inlining (because of stacks getting too deep), but not creating additional methods might be getting in the way of inlining as well (because of method bodies getting too large). I don’t want to bet on either approach here at this level, nor is jOOQ tuned that much. Like most similar libraries, the traversal of the jOOQ expression tree generates stack that are too deep to completely inline anyway.

But the very obvious loser here is the Stream approach, which is roughly 6.5x slower in this benchmark than the imperative approaches. This isn’t surprising. The stream pipeline has to be set up every single time to represent something as trivial as the above imperative loop. I’ve already blogged about this in the past, where I compared replacing simple for loops by Stream.forEach()

Meh, does it matter?

In your business logic? Probably not. Your business logic is I/O bound, mostly because of the database. Wasting a few CPU cycles on a client side loop is not the main issue. Even if it is, the waste probably happens because your loop shouldn’t even be at the client side in the first place, but moved into the database as well. I’m currently touring conferences with a call about that topic:

In your infrastructure logic? Maybe! If you’re writing a library, or if you’re using a library like jOOQ, then yes. Chances are that a lot of your logic is CPU bound. You should occasionally profile your application and spot such bottlenecks, both in your code and in third party libraries. E.g. in most of jOOQ’s internals, using a stream pipeline might be a very bad choice, because ultimately, jOOQ is something that might be invoked from within your loops, thus adding significant overhead to your application, if your queries are not heavy (e.g. again when run against an H2 in-memory database).

So, given that you’re clearly “micro-losing” on the performance side by using the Stream API, you may need to evaluate the readability tradeoff more carefully. When business logic is complex, readability is very important compared to micro optimisations. With infrastructure logic, it is much less likely so, in my opinion. And I’m not alone:

Note: there’s that other cargo cult of premature optimisation going around. Yes, you shouldn’t worry about these details too early in your application implementation. But you should still know when to worry about them, and be aware of the tradeoffs.

And while you’re still debating what name to give to that extracted method, I’ve written 5 new labeled if statements! ;-)

How to Patch Your IDE to Fix an Urgent Bug

Clock’s ticking. JDK 11 will remove a bunch of deprecated modules through JEP 320, which includes the Java EE modules, which again includes JAXB, a dependency of many libraries, including jOOQ. Thus far, few people have upgraded to Java 9 or 10, as these aren’t LTS releases. Unlike in the old days, however, people will be forced much earlier to upgrade to Java 11, because Java 8 (the free version) will reach end of life soon after Java 11 is released:

End of Public Updates for Oracle JDK 8
As outlined in the Oracle JDK Support Roadmap below, Oracle will not post further updates of Java SE 8 to its public download sites for commercial use after January 2019

So, we library developers must act and finally modularise our libraries. Which is, quite frankly, a pain. Not because of the module system itself, which works surprisingly well. But because of the toolchain, which is far from being production ready. This mostly includes:

It’s still almost not possible to maintain a modularised project in an IDE (I’ve tried Eclipse and IntelliJ, not Netbeans so far) as there are still tons of bugs. Some of which are showstoppers, halting compilation in the IDE (despite compilation working in Maven). For example:

But rather than just complaining, let’s complain and fix it

Let’s fix our own IDE by patching it

Disclaimer: The following procedure assumes that you have the right to modify your IDE’s source and binaries. To my understanding, this is the case with the EPL licensed Eclipse. It may not be the case for other IDEs.

Disclaimer2: Note, as reddit user fubarbazqux so eloquently put it, there are cleaner ways to apply patches (and contribute them) to the Eclipse community, if you have more time. This article just displays a very easy way to do things without spending too much time to figure out how the Eclipse development processes work, internally. It shows a QUICK FIX recipe

The first bug was already discovered and fixed for Eclipse 4.8, but its RC4 version seems to have tons of other problems, so let’s not upgrade to that yet. Instead, let’s apply the fix that can be seen here to our own distribution:

https://github.com/eclipse/eclipse.jdt.core/commit/e60c4f1f36f7efd5fbc1bbc661872b78c6939230#diff-e517e5944661053f0fcff49d9432b74e

It’s just a single line:

How do we do this?

First off, go to the Eclipse Packages Download page:

http://www.eclipse.org/downloads/eclipse-packages

And download the “Eclipse IDE for Eclipse Committers” distribution:

It will contain all the Eclipse source code, which we’ll need to compile the above class. In the new workspace, create a new empty plugin project:

Specify the correct execution environment (in our case Java 10) and add all the Java Development Tools (JDT) dependencies:

Or just add all the available dependencies, it doesn’t really matter.

You can now open the type that you want to edit:

Now, simply copy the source code from the editor and paste it in a new class inside of your project, which you put in the same package as the original (split packages are still possible in this case, yay)

Inside of your copy, apply the desired patch and build the project. Since you already included all the dependencies, it will be easy to compile your copy of the class, and you don’t have to build the entirety of Eclipse.

Now, go to your Windows Explorer or Mac OS X Finder, or Linux shell or whatever and find the compiled class:

This class can now be copied into the Eclipse plugin. How to find the appropriate Eclipse plugin? Just go to your plugin dependencies and check out the location of the class you’ve opened earlier:

Open that plugin from your Eclipse distribution’s /plugins folder using 7zip or whatever zipping tool you prefer, and overwrite the original class file(s). You may need to close Eclipse first, before you can write to the plugin zip file. And it’s always a good idea to make backup copies of the original plugin(s).

Be careful that if your class has any nested classes, you will need to copy them all, e.g.

MyClass.class
MyClass$1.class // Anonymous class
MyClass$Nested.class // Named, nested class

Restart Eclipse, and your bug should be fixed!

How to fix my own bugs?

You may not always be lucky to find a bug with an existing fix in the bug tracker as in the second case:
https://bugs.eclipse.org/bugs/show_bug.cgi?id=535927

No problemo, we can hack our way around that as well. Launch your normal Eclipse instance (not the “Eclipse IDE for Eclipse Committers” one) with a debug agent running, by adding the following lines to your eclipse.ini file:

-Xdebug 
-Xnoagent 
-Djava.compile=NONE 
-Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5005

Launch Eclipse again, then connect to your Eclipse from your other “Eclipse IDE for Eclipse Committers” instance by connecting a debugger:

And start setting breakpoints wherever you need, e.g. here, in my case:

java.lang.NullPointerException
	at org.eclipse.jdt.internal.compiler.problem.ProblemHandler.handle(ProblemHandler.java:145)
	at org.eclipse.jdt.internal.compiler.problem.ProblemHandler.handle(ProblemHandler.java:226)
	at org.eclipse.jdt.internal.compiler.problem.ProblemReporter.handle(ProblemReporter.java:2513)
	at org.eclipse.jdt.internal.compiler.problem.ProblemReporter.deprecatedType(ProblemReporter.java:1831)
	at org.eclipse.jdt.internal.compiler.problem.ProblemReporter.deprecatedType(ProblemReporter.java:1808)
	at org.eclipse.jdt.internal.compiler.lookup.CompilationUnitScope.checkAndRecordImportBinding(CompilationUnitScope.java:960)
	at org.eclipse.jdt.internal.compiler.lookup.CompilationUnitScope.faultInImports(CompilationUnitScope.java:471)
	at org.eclipse.jdt.internal.compiler.lookup.CompilationUnitScope.faultInTypes(CompilationUnitScope.java:501)
	at org.eclipse.jdt.internal.compiler.Compiler.process(Compiler.java:878)
	at org.eclipse.jdt.internal.compiler.ProcessTaskManager.run(ProcessTaskManager.java:141)
	at java.lang.Thread.run(Unknown Source)

And start analysing the problem like your own bugs. The nice thing is, you don’t have to fix the problem, just find it, and possibly comment out some lines of code if you think they’re not really needed. In my case, luckily, the regression was introduced by a new method that is applied to JDK 9+ projects only:

String deprecatedSinceValue(Supplier<AnnotationBinding[]> annotations) {
    // ...
}

The method will check for the new @Deprecated(since="9") attribute on the @Deprecated annotation. Not an essential feature, so let’s just turn it off by adding this line to the source file:

String deprecatedSinceValue(Supplier<AnnotationBinding[]> annotations) {
    if (true) return;
    // ...
}

This will effectively prevent the faulty logic from ever running. Not a fix, but a workaround. For more details about this specific issue, see the report. Of course, never forget to actually report the issue to Eclipse (or whatever your IDE is), so it can be fixed thoroughly for everyone else as well

Compile. Patch. Restart. Done!

Conclusion

Java is a cool platform. It has always been a very dynamic language at runtime, where compiled class files can be replaced by new versions at any moment, and re-loaded by the class loaders. This makes patching code by other vendors very easy, just:

  • Create a project containing the vendors’ code (or if you don’t have the code, the binaries)
  • Apply a fix / workaround to the Java class that is faulty (or if you don’t have the code, decompile the binaries if you are allowed to)
  • Compile your own version
  • Replace the version of the class file from the vendor by yours
  • Restart

This works with all software, including IDEs. In the case of jOOQ, all our customers have the right to modification, and they get the sources as well. We know how useful it is to be able to patch someone else’s code. This article shows it. Now, I can continue modularising jOOQ, and as a side product, improve the tool chain for everybody else as well.

Again, this article displayed a QUICK FIX approach (some call it “hack”). There are more thorough ways to apply patches / fixes, and contribute them back to the vendor.

Another, very interesting option would be to instrument your runtime and apply the fix only to byte code:

And:

Fixing Bugs in Running Java Code with Dynamic Attach

A note on IntelliJ and NetBeans

Again, I haven’t tried NetBeans yet (although I’ve heard its Java 9 support has been working very well for quite a while).

While IntelliJ’s Jigsaw support seems more advanced than Eclipse’s (still with a few flaws as well), it currently has a couple of performance issues when compiling projects like jOOQ or jOOλ. In a future blog post, I will show how to “fix” those by using a profiler, like:

  • Java Mission Control (can be used as a profiler, too)
  • YourKit
  • JProfiler

Profilers can be used to very easily track down the main source of a performance problem. I’ve reported a ton to Eclipse already. For instance, this one:

https://bugs.eclipse.org/bugs/show_bug.cgi?id=474686

Where a lot of time is being spent in the processing of Task Tags, like:

  • TODO
  • FIXME
  • XXX

The great thing about profiling this is:

  • You can report a precise bug to the vendor
  • You can find the flawed feature and turn it off as a workaround. Turning off the above task tag feature was a no-brainer. I’m not even using the feature.

So, stay tuned for another blog post, soon.

Truth First, or Why You Should Mostly Implement Database First Designs

In this much overdue article, I will explain why I think that in almost all cases, you should implement a “database first” design in your application’s data models, rather than a “Java first” design (or whatever your client language is), the latter approach leading to a long road of pain and suffering, once your project grows.

This article is inspired by a recent Stack Overflow question.

Interesting reddit discussions on /r/java and /r/programming.

Code generation

To my surprise, a small group of first time jOOQ users seem to be appalled by the fact that jOOQ heavily relies on source code generation. No one keeps you from using jOOQ the way you want and you don’t have to use code generation, but the default way to use jOOQ according to the manual is to start with a (legacy) database schema, reverse engineer that using jOOQ’s code generator to get a bunch of classes representing your tables, and then to write type safe queries against those tables:

for (Record2<String, String> record : DSL.using(configuration)
//   ^^^^^^^^^^^^^^^^^^^^^^^ Type information derived from the 
//   generated code referenced from the below SELECT clause

       .select(ACTOR.FIRST_NAME, ACTOR.LAST_NAME)
//           vvvvv ^^^^^^^^^^^^  ^^^^^^^^^^^^^^^ Generated names
       .from(ACTOR)
       .orderBy(1, 2)) {
    // ...
}

The code is generated either manually outside of the build, or automatically with every build. For instance, such a re-generation could follow immediately after a Flyway database migration, which can also be run either manually or automatically.

Source code generation

There are different philosophies, advantages, and disadvantages regarding these manual/automatic approaches, which I don’t want to discuss in this article. But essentially, the point of generated code is that it provides a Java representation of something that we take for granted (a “truth”) either within or outside of our system. In a way, compilers do the same thing when they generate byte code, machine code, or some other type of source code from the original sources – we get a representation of our “truth” in a different language, for whatever reason.

There are many such code generators out there. For instance, XJC can generate Java code from XSD or WSDL files. The principle is always the same:

  • There is some truth (internal or external), like a specification, data model, etc.
  • We need a local representation of that truth in our programming language

And it almost always makes sense to generate that latter, to avoid redundancy.

Type providers and annotation processing

Noteworthy: Another, more modern approach to jOOQ’s particular code generation use-case would be Type Providers, as implemented by F#, in case of which the code is generated by the compiler while compiling. It never really exists in its source form. A similar (but less sophisticated) tool in Java are annotation processors, e.g. Lombok.

In a way, this does the same thing except:

  • You don’t see the generated code (perhaps that’s less appalling to some?)
  • You must ensure the types can be provided, i.e. the “truth” must always be available. Easy in the case of Lombok, which annotates the “truth”. A bit more difficult with database models, which rely on an always available live connection.

What’s the problem with code generation?

Apart from the tricky question whether to trigger code generation manually or automatically, some people seem to think that code must not be generated at all. The reason I hear the most is the idea that it is difficult to set up in a build pipeline. And yes, that is true. There is extra infrastructure overhead. Especially if you’re new to a certain product (like jOOQ, or JAXB, or Hibernate, etc.), setting up an environment takes time you would rather spend learning the API itself and getting value out of it.

If the overhead of learning how the code generator works is too high, then indeed, the API failed to make the code generator easy to use (and later on, to customise). That should be a high priority for any such API. But that’s the only argument against code generation. Other than that, it makes absolutely no sense at all to hand-write the local representation of the internal or external truth.

Many people argue that they don’t have time for that stuff. They need to ship their MVPs. They can finalise their build pipelines later. I say:

“Too Busy To Improve” Licensed CC by Alan O’Rourke / Audience Stack. Original Image

“But Hibernate / JPA makes coding Java first easy”

Yes that’s true. And it’s both a bliss and a curse for Hibernate and its users. In Hibernate, you can just write a couple of entities, such as:

@Entity
class Book {
  @Id
  int id;
  String title;
}

And you’re almost set. Let Hibernate generate the boring “details” of how to define this entity in your SQL dialect’s DDL:

CREATE TABLE book (
  id INTEGER PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
  title VARCHAR(50),

  CONSTRAINT pk_book PRIMARY KEY (id)
);

CREATE INDEX i_book_title ON book (title);

… and start running the application. That’s really cool to get started quickly and to try out things.

But, huh, wait. I cheated.

  • Will Hibernate really apply that named primary key definition?
  • Will it create the index on TITLE, which I know we’ll need?
  • Will it add an identity specification?

Probably not. While you’re developing your greenfield project, it is convenient to always throw away your entire database and re-generate it from scratch, once you’ve added the additional annotations. So, the Book entity would eventually look like this:

@Entity
@Table(name = "book", indexes = {
  @Index(name = "i_book_title", columnList = "title")
})
class Book {
  @Id
  @GeneratedValue(strategy = IDENTITY)
  int id;
  String title;
}

Cool. Re-generate. Again, this makes it really easy to get started.

But you’ll pay the price later on

At some point, you go to production. And that’s when this model no longer works. Because

Once you go live, you can no longer throw away your database, as your database has become legacy.

From now on, you have to write DDL migration scripts, e.g. using Flyway. And then, what happens to your entities? You can either adapt them manually (so you double the work), or have Hibernate re-generate them for you (how big are your chances of the generation matching your expectations?) You can only lose.

Because once you go to production, you need hotfixes. And those have to go live fast. And since you didn’t prepare for pipelining your migrations to production smoothly, you’ll patch things wildly. And then you run out of time to do it right™. And you’ll blame Hibernate, because it’s always someone else’s fault…

Instead, you could have done things entirely differently from the beginning. Like using those round wheels.

Go Database First

The real “truth” of your database schema, and the “sovereignty” over it, resides with your database. The database is the only place where the schema is defined, and all clients have a copy of the database schema, not vice versa. The data is in your database, not in your client, so it makes perfect sense to enforce the schema and its integrity in the database, right where the data is.

This is old wisdom, nothing new. Primary and unique keys are good. Foreign keys are good. Check constraints are good. Assertions (when they’re finally implemented) are good.

And that’s not where it ends. For instance, if you’re using Oracle, you may want to specify:

  • In what tablespace your table resides
  • What PCTFREE value it has
  • What the cache size of your sequence (behind the identity) is

Maybe, all of this doesn’t matter in small systems, but you don’t have to go “big data” before you can profit from vendor-specific storage optimisations as the above. None of the ORMs I’ve ever seen (including jOOQ) will allow you to use the full set of DDL options that you may want to use on your database. ORMs offer some tools to help you write DDL.

But ultimately, a well-designed schema is hand written in DDL. All generated DDL is only an approximation of that.

What about the client model?

As mentioned before, you will need a copy of your database schema in your client, a client representation. Needless to say that this client representation needs to be in-sync with the real model. How to best do that? By using a code generator.

All databases expose their meta information through SQL. Here’s how to get all tables from your database in various SQL dialects:

-- H2, HSQLDB, MySQL, PostgreSQL, SQL Server
SELECT table_schema, table_name
FROM information_schema.tables

-- DB2
SELECT tabschema, tabname
FROM syscat.tables

-- Oracle
SELECT owner, table_name
FROM all_tables

-- SQLite
SELECT name
FROM sqlite_master

-- Teradata
SELECT databasename, tablename
FROM dbc.tables

These queries (or similar ones, e.g. depending on whether views, materialised views, table valued functions should also be considered) are also run by JDBC’s DatabaseMetaData.getTables() call, or by the jOOQ-meta module.

From the result of such queries, it’s relatively easy to generate any client representation of your database model, regardless what your client technology is.

  • If you’re using JDBC or Spring, you can create a bunch of String constants
  • If you’re using JPA, you can generate the entities themselves
  • If you’re using jOOQ, you can generate the jOOQ meta model

Depending on the amount of features your client API offers (e.g. jOOQ or JPA), the generated meta model can be really rich and complete. Consider, for instance, jOOQ 3.11’s implicit join feature, which relies on generated meta information about the foreign key relationships between your tables.

Now, any database increment will automatically lead to updated client code. For instance, imagine:

ALTER TABLE book RENAME COLUMN title TO book_title;

Would you really want to do this work twice? No way. Just commit the DDL, run it through your build pipeline, and have an updated entity:

@Entity
@Table(name = "book", indexes = {

  // Would you have thought of this?
  @Index(name = "i_book_title", columnList = "book_title")
})
class Book {
  @Id
  @GeneratedValue(strategy = IDENTITY)
  int id;

  @Column("book_title")
  String bookTitle;
}

Or an updated jOOQ class. Plus: Your client code might no longer compile, which can be a good thing! Most DDL changes are also semantic changes, not just syntactic ones. So, it’s great to be able to see in compiled client source code, what code is (or may be) affected by your database increment.

A single truth

Regardless what technology you’re using, there’s always one model that contains the single truth for a subsystem – or at least, we should aim for this goal and avoid the enterprisey mess where “truth” is everywhere and nowhere. It just makes everything much simpler. If you exchange XML files with some other system, you’re going to use XSD. Like jOOQ’s INFORMATION_SCHEMA meta model in XML form:
https://www.jooq.org/xsd/jooq-meta-3.10.0.xsd

  • XSD is well understood
  • XSD specifies XML content very well, and allows for validation in all client languages
  • XSD can be versioned easily, and evolved backwards compatibly
  • XSD can be translated to Java code using XJC

The last bullet is important. When communicating with an external system through XML messages, we want to be sure our messages are valid. That’s really really easy to do with JAXB, XJC, and XSD. It would be outright nuts to think that a Java-first approach where we design our messages as Java objects could somehow be reasonably mapped to XML for someone else to consume. That generated XML would be of very poor quality, undocumented, and hard to evolve. If there’s an SLA on such an interface, we’d be screwed.

Frankly, that’s what happens to JSON APIs all the time, but that’s another story, another rant…

Databases: Same thing

When you’re using databases, it’s the same thing. The database owns its data and it should be the master of the schema. All modifications to the schema should be implemented using DDL directly, to update the single truth.

Once that truth is updated, all clients need to update their copies of the model as well. Some clients may be written in Java, using either (or both of) jOOQ and Hibernate, or JDBC. Other clients may be written in Perl (good luck to them). Even other clients may be written in C#. It doesn’t matter. The main model is in the database. ORM-generated models are of poor quality, not well documented, and hard to evolve.

So, don’t do it. And, don’t do it from the very beginning. Instead, go database first. Build a deployment pipeline that can be automated. Include code generators to copy your database model back into the clients. And stop worrying about code generation. It’s a good thing. You’ll be productive. All it takes is a bit of initial effort to set it up, and you’ll get years of improved productivity for the rest of your project.

Thank me later.

Clarification

Just to be sure: This article in no way asserts that your database model should be imposed on your entire system (e.g. your domain, your business logic, etc. etc.). The claim I made here is that client code interacting with the database should act upon the database model, and not have its own first class model of the database instead. This logic typically resides in the data access layer of your client.

In 2-tier architectures, which still have their place sometimes, that may be the only model of your system. In most systems, however, I consider the data access layer a “subsystem” that encapsulates the database model. So, there.

Exceptions

There are always exceptions, and I promised that the database first and code generation approach may not always be the right choice. These exceptions are (probably not exhaustive):

  • When the schema is unknown and must be discovered. E.g. you’re a tool vendor helping users navigate any schema. Duh… No code generation. But still database first.
  • When the schema needs to be generated on the fly for some task. This sounds a lot like a more or less sophisticated version of the entity attribute value pattern, i.e. you don’t really have a well-defined schema. In that case, it’s often not even sure if an RDBMS will be the right choice.

The nature of exceptions is that they’re exceptional. In the majority of RDBMS usage, the schema is known in advance, placed inside the RDBMS as the single source of “truth”, and clients will have derived copies from it – ideally generated using a code generator.

How to Run a Bulk INSERT .. RETURNING Statement With Oracle and JDBC

When inserting records into SQL databases, we often want to fetch back generated IDs and possibly other trigger, sequence, or default generated values. Let’s assume we have the following table:

-- DB2
CREATE TABLE x (
  i INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, 
  j VARCHAR(50), 
  k DATE DEFAULT CURRENT_DATE
);

-- PostgreSQL
CREATE TABLE x (
  i SERIAL4 PRIMARY KEY, 
  j VARCHAR(50), 
  k DATE DEFAULT CURRENT_DATE
);

-- Oracle
CREATE TABLE x (
  i INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, 
  j VARCHAR2(50), 
  k DATE DEFAULT SYSDATE
);

DB2

DB2 is the only database currently supported by jOOQ, which implements the SQL standard according to which we can SELECT from any INSERT statement, including:

SELECT *
FROM FINAL TABLE (
  INSERT INTO x (j)
  VALUES ('a'), ('b'), ('c')
);

The above query returns:

I |J |K          |
--|--|-----------|
1 |a |2018-05-02 |
2 |b |2018-05-02 |
3 |c |2018-05-02 |

Pretty neat! This query can simply be run like any other query in JDBC, and you don’t have to go through any hassles.

PostgreSQL and Firebird

These databases have a vendor specific extension that does the same thing, almost as powerful:

-- Simple INSERT .. RETURNING query
INSERT INTO x (j)
VALUES ('a'), ('b'), ('c')
RETURNING *;

-- If you want to do more fancy stuff
WITH t AS (
  INSERT INTO x (j)
  VALUES ('a'), ('b'), ('c')
  RETURNING *
)
SELECT * FROM t;

Both syntaxes work equally well, the latter is just as powerful as DB2’s, where the result of an insertion (or update, delete, merge) can be joined to other tables. Again, no problem with JDBC

Oracle

In Oracle, this is a bit more tricky. The Oracle SQL language doesn’t have an equivalent of DB2’s FINAL TABLE (DML statement). The Oracle PL/SQL language, however, does support the same syntax as PostgreSQL and Firebird. This is perfectly valid PL/SQL

-- Create a few auxiliary types first
CREATE TYPE t_i AS TABLE OF NUMBER(38);
/
CREATE TYPE t_j AS TABLE OF VARCHAR2(50);
/
CREATE TYPE t_k AS TABLE OF DATE;
/

DECLARE 
  -- These are the input values
  in_j t_j := t_j('a', 'b', 'c');
  
  out_i t_i;
  out_j t_j;
  out_k t_k;
  
  c1 SYS_REFCURSOR;
  c2 SYS_REFCURSOR;
  c3 SYS_REFCURSOR;
BEGIN

  -- Use PL/SQL's FORALL command to bulk insert the
  -- input array type and bulk return the results
  FORALL i IN 1 .. in_j.COUNT
    INSERT INTO x (j)
    VALUES (in_j(i))
    RETURNING i, j, k
    BULK COLLECT INTO out_i, out_j, out_k;
  
  -- Fetch the results and display them to the console
  OPEN c1 FOR SELECT * FROM TABLE(out_i);  
  OPEN c2 FOR SELECT * FROM TABLE(out_j);  
  OPEN c3 FOR SELECT * FROM TABLE(out_k); 
  
  dbms_sql.return_result(c1);
  dbms_sql.return_result(c2);
  dbms_sql.return_result(c3);
END;
/

A bit verbose, but it has the same effect. Now, from JDBC:

try (Connection con = DriverManager.getConnection(url, props);
    Statement s = con.createStatement();

    // The statement itself is much more simple as we can
    // use OUT parameters to collect results into, so no
    // auxiliary local variables and cursors are needed
    CallableStatement c = con.prepareCall(
        "DECLARE "
      + "  v_j t_j := ?; "
      + "BEGIN "
      + "  FORALL j IN 1 .. v_j.COUNT "
      + "    INSERT INTO x (j) VALUES (v_j(j)) "
      + "    RETURNING i, j, k "
      + "    BULK COLLECT INTO ?, ?, ?; "
      + "END;")) {

    try {

        // Create the table and the auxiliary types
        s.execute(
            "CREATE TABLE x ("
          + "  i INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,"
          + "  j VARCHAR2(50),"
          + "  k DATE DEFAULT SYSDATE"
          + ")");
        s.execute("CREATE TYPE t_i AS TABLE OF NUMBER(38)");
        s.execute("CREATE TYPE t_j AS TABLE OF VARCHAR2(50)");
        s.execute("CREATE TYPE t_k AS TABLE OF DATE");

        // Bind input and output arrays
        c.setArray(1, ((OracleConnection) con).createARRAY(
            "T_J", new String[] { "a", "b", "c" })
        );
        c.registerOutParameter(2, Types.ARRAY, "T_I");
        c.registerOutParameter(3, Types.ARRAY, "T_J");
        c.registerOutParameter(4, Types.ARRAY, "T_K");

        // Execute, fetch, and display output arrays
        c.execute();
        Object[] i = (Object[]) c.getArray(2).getArray();
        Object[] j = (Object[]) c.getArray(3).getArray();
        Object[] k = (Object[]) c.getArray(4).getArray();

        System.out.println(Arrays.asList(i));
        System.out.println(Arrays.asList(j));
        System.out.println(Arrays.asList(k));
    }
    finally {
        try {
            s.execute("DROP TYPE t_i");
            s.execute("DROP TYPE t_j");
            s.execute("DROP TYPE t_k");
            s.execute("DROP TABLE x");
        }
        catch (SQLException ignore) {}
    }
}

The above code will display:

[1, 2, 3]
[a, b, c]
[2018-05-02 10:40:34.0, 2018-05-02 10:40:34.0, 2018-05-02 10:40:34.0]

Exactly what we wanted.

jOOQ support

A future version of will emulate the above PL/SQL block from the jOOQ INSERT .. RETURNING statement:

DSL.using(configuration)
   .insertInto(X)
   .columns(X.J)
   .values("a")
   .values("b")
   .values("c")
   .returning(X.I, X.J, X.K)
   .fetch();

This will correctly emulate the query for all of the databases that natively support the syntax. In the case of Oracle, since jOOQ cannot create nor assume any SQL TABLE types, PL/SQL types from the DBMS_SQL package will be used

The relevant issue is here: https://github.com/jOOQ/jOOQ/issues/5863