How to Statically Override the Default Settings in jOOQ

When configuring a jOOQ runtime Configuration, you may add an explicit Settings instance, which contains a set of useful flags that change jOOQ’s SQL generation behaviour and other things.

Example settings include:

… and much more. Your configuration will probably include an explicit Settings instance where you have fine grained, perhaps even per-execution control over these flags. But in many cases, the default settings are applied, which include, for example, quoting all identifiers.

How to override the default

Recently, a client had trouble using jOOQ on an older Informix version, which couldn’t handle quoted identifiers in the FROM clause. The code generator produced this problematic SQL statement:

select distinct trim("informix"."systables"."owner")
from "informix"."systables"
where "informix"."systables"."owner" in ('<schema name>')

This would have worked:

select distinct trim("informix"."systables"."owner")
from informix.systables
where "informix"."systables"."owner" in ('<schema name>')

Luckily, the default can be overridden and we can specify not to quote any identifiers throughout jOOQ by specifying a Settings instance:

Programmatic

We can set this explicitly on a Configuration

new Settings().withRenderNameStyle(RenderNameStyle.AS_IS);

Configurative

We can put this XML file on the class path at “/jooq-settings.xml” or direct jOOQ to it via the “-Dorg.jooq.settings” system property:

<settings>
  <renderNameStyle>AS_IS</renderNameStyle>
</settings>

The XML must implement this schema: https://www.jooq.org/xsd/jooq-runtime-3.11.2.xsd (or a newer version of it)

So, the SQL that will now be generated with such a jooq-settings.xml file on the classpath is this:

select distinct trim(informix.systables.owner)
from informix.systables
where informix.systables.owner in ('<schema name>')

Want to get rid of the schema as well?

<settings>
  <renderNameStyle>AS_IS</renderNameStyle>
  <renderSchema>false</renderSchema>
</settings>

You’re now getting this SQL:

select distinct trim(systables.owner)
from systables
where systables.owner in ('<schema name>')

How to Unit Test Your Annotation Processor using jOOR

Annotation processors can be useful as a hacky workaround to get some language feature into the Java language. The best example is Lombok, which enhances the Java language with quite a few annotation-based features.

jOOQ also has an annotation processor that helps validate SQL syntax for:

  • Plain SQL usage (SQL injection risk)
  • SQL dialect support (prevent using an Oracle only feature on MySQL)

You can read about it more in detail here.

Unit testing annotation processors

Unit testing annotation processors is a bit more tricky than using them. Your processor hooks into the Java compiler and manipulates the compiled AST (or does other things). If you want to test your own processor, you need the test to run a Java compiler, but that is difficult to do in a normal project setup, especially if the expected behaviour for a given test is a compilation error.

Let’s assume we have the following two annotations:

@interface A {}
@interface B {}

And now, we would like to establish a rule that @A must always be accompanied by @B. For example:

// This must not compile
@A
class Bad {}

// This is fine
@A @B
class Good {}

We’ll enforce that with an annotation processor:

class AProcessor implements Processor {
    boolean processed;

    @Override
    public Set<String> getSupportedOptions() {
        return Collections.emptySet();
    }

    @Override
    public Set<String> getSupportedAnnotationTypes() {
        return Collections.singleton("*");
    }

    @Override
    public SourceVersion getSupportedSourceVersion() {
        return SourceVersion.RELEASE_8;
    }

    @Override
    public void init(ProcessingEnvironment processingEnv) {
    }

    @Override
    public boolean process(Set<? extends TypeElement> annotations, RoundEnvironment roundEnv) {
        for (TypeElement e1 : annotations)
            if (e1.getQualifiedName().contentEquals(A.class.getName()))
                for (Element e2 : roundEnv.getElementsAnnotatedWith(e1))
                    if (e2.getAnnotation(B.class) == null)
                        throw new RuntimeException("Annotation A must be accompanied by annotation B");

        this.processed = true;
        return false;
    }

    @Override
    public Iterable<? extends Completion> getCompletions(Element element, AnnotationMirror annotation, ExecutableElement member, String userText) {
        return Collections.emptyList();
    }
}

Now, this works. We can easily verify that manually by adding the annotation processor to some Maven compiler configuration and by annotating a few classes with A and B. But then, someone changes the code and we don’t notice the regression. How can we unit test this, rather than doing things manually?

jOOR 0.9.10 support for annotation processors

jOOR is our little open source reflection library that we’re using internally in jOOQ

jOOR has a convenient API to invoke the javax.tools.JavaCompiler API through Reflect.compile(). The most recent release 0.9.10 now takes an optional CompileOptions argument where annotation processors can be registered.

This means, we can now write a very simple unit test as follows (and if you’re using Java 12, you can profit from raw string literals! For a Java 11 compatible version without raw string literals, see our unit tests on github):

@Test
public void testCompileWithAnnotationProcessors() {
    AProcessor p = new AProcessor();

    try {
        Reflect.compile(
            "org.joor.test.FailAnnotationProcessing",
            ```
             package org.joor.test; 
             @A 
             public class FailAnnotationProcessing {
             }
            ```,
            new CompileOptions().processors(p)
        ).create().get();
        Assert.fail();
    }
    catch (ReflectException expected) {
        assertFalse(p.processed);
    }

    Reflect.compile(
        "org.joor.test.SucceedAnnotationProcessing",
        ```
         package org.joor.test; 
         @A @B 
         public class SucceedAnnotationProcessing {
         }
        ```,
        new CompileOptions().processors(p)
    ).create().get();
    assertTrue(p.processed);
}

So easy! Never have regressions in your annotation processors again!

How to Create a Good MCVE (Minimal Complete Verifiable Example)

Reporting a bug takes time, and trust me, every vendor appreciates your reporting of a bug! Your voice counts as many voices, for all the other customers of a product who do not want to or cannot take the time to report the same bug are numerous.

So, first off, thanks for taking that time and reaching out to us vendors. We really appreciate your help!

Having said so, reporting a bug can be a tedious exercise. For both parties, the one reporting the bug and the one receiving it. There are extremely simple bugs, such as typos in documentation. They can be easily pointed to and just as easily be fixed. There are much trickier bugs, such as concurrency issues in complicated project setups. They take time to reproduce. This is why an MCVE (Minimal Complete Verifiable Example) is so useful. The linked stack overflow page explains why it is so useful to answer questions. But the same arguments apply when reporting a bug.

And that’s where the tricky part starts. It isn’t easy to create an example that is:

  • Minimal: Your real world application code is huge. You cannot dump the entirety of it to the vendor for various reasons. And the vendor cannot look through it all to try to reproduce it. So, the problem has to be isolated into an example of minimal scope, with no unnecessary additional functionality. That’s hard too, because your project has been set up months or years ago. You don’t want to spend too much time setting up a new project
  • Complete: When reducing the problem to a minimal one, we’re tempted to just describe it in prose. But that can be difficult as well, because prose is hardly complete. It’s difficult to describe a problem when it would be quite easy to show the code. But that brings us back to the minimal part. We want to show only the relevant code, not all of it.
  • Verifiable: Ultimately, the ideal example can be used by the vendor to reproduce the problem, because once that’s possible, the vendor can start debugging it and finding the right spots to fix quite easily. Otherwise, it’s just guessing and going back and forth with the reporter, just to write more prose. That’s tiring on both sides.

This is why we now have an example project on GitHub to help you create that MCVE:

https://github.com/jOOQ/jOOQ-mcve

It is a minimal example that uses:

This example can be forked on GitHub and modified by you directly, in order to show how to reproduce your issue. In the future, we’ll add more example setups that may be helpful to reproduce your specific issue.

Thanks again for taking the time to report issues. We vendors really appreciate your work!

How to Use jOOQ’s UpdatableRecord for CRUD to Apply a Delta

While jOOQ is not a full fledged ORM (as in an object graph persistence framework), there is still some convenience available to avoid hand-writing boring SQL for every day CRUD.

That’s the UpdatableRecord API. It has a few very useful features, including:

A 1:1 mapping to the underlying table

Every UpdatableRecord is mapped on a 1:1 basis to the underlying table (or view) of your database. For example, if you create a table like this (using PostgreSQL syntax):

CREATE TABLE customer (
  id BIGSERIAL NOT NULL PRIMARY KEY,
  first_name TEXT NOT NULL,
  last_name TEXT NOT NULL,
  vip BOOLEAN DEFAULT FALSE
);

… and if you’re using the code generator (which you should), then apart from the generated table that is used in SQL queries, you also have an UpdatableRecord that allows you to do this:

CustomerRecord customer = ctx.newRecord(CUSTOMER);
customer.setFirstName("John");
customer.setLastName("Doe");
customer.store();

The above statement will generate a JDBC statement like this:

INSERT INTO customer (first_name, last_name)
VALUES (?, ?)

Notice…

SQL DEFAULT expressions

… we only set the FIRST_NAME and LAST_NAME values on the record, and only those values have been included in the insert statement. The other columns are not included, which allows for SQL DEFAULT expressions to be applied. This is true for both the ID serial column, as well as the VIP defaulted column.

The actual database record will look something like this:

id     first_name   last_name   vip
-------------------------------------
1337   John         Doe         false

Notice how the VIP column is nullable, which means that an unknown VIP status is something our database design explicitly allows for. Of course, we could easily set that value to NULL as well, preventing the DEFAULT from applying. For example:

// Assuming you actually *need* the client round trip...
CustomerRecord customer = ctx.fetchOne(
  CUSTOMER, CUSTOMER.ID.eq(1337));
customer.setVip(null);
customer.store();

The above query would produce the following update statement:

UPDATE customer SET vip = ? WHERE id = ?

Sidenote: Please do not use this feature if you never really needed the data in the client. In that case, just write the UPDATE statement and save a client-server roundtrip!

Notice again that only the column that we’ve explicitly changed will be included in the update statement, not all the others. This has advantages and disadvantages:

Advantages

  • Even if the CustomerRecord was fetched incompletely (e.g. through a query), we don’t need all the other columns to modify this one.
  • If a trigger listens on columns that have been included in the actual query, we can distinguish between a value being set to NULL explicitly, and a value not being available from the DML statement. Think of JavaScript’s difference between null and undefined.

Disadvantages

  • If we allow users to change arbitrary columns on such a record, there is a very high number of possible resulting queries, which can hurt execution plan cache performance in the database.

This is the tradeoff. jOOQ opted for full support of SQL DEFAULT values, whose application is activated when a Record.changed() flag is not set. As soon as the flag is set, jOOQ assumes that the explicit setting of the value is wanton, and it will thus be propagated to the generated SQL query.

SQL DEFAULT expressions and POJOs

jOOQ Records can encode that difference. Plain Old Java Objects (POJOs) cannot. There is only null, if you choose a reference type for your data. What does null mean in that context? It’s not possible to know. A lot of jOOQ users will want that null value to encode the SQL DEFAULT value, not the SQL NULL value. But when you’re loading your POJO content into a jOOQ record, you’re effectively setting all the column values to NULL explicitly, because all the Record.changed() flags will be set:

// Load the customer record from a POJO
CustomerRecord customer = ctx.newRecord(CUSTOMER, customerPojo);
customer.update();

Now, quite likely, the resulting query will be:

UPDATE customer
SET
  first_name = ?,
  last_name = ?,
  vip = ?
WHERE id = ?

But what if your CustomerPojo was retrieved from a JSON document, e.g. like this one?

{
  "id": 1337,
  "lastName": "Smith"
}

It looks like the intent of this document is for the record “John Doe” to be renamed to “John Smith”, so this update statement should have resulted instead:

UPDATE customer
SET last_name = ?
WHERE id = ?

Clearly, we shouldn’t set the FIRST_NAME nor VIP values to NULL.

Both the JSON document structure and the jOOQ UpdatableRecord can encode the difference between undefined / DEFAULT and null / NULL. But the POJO cannot. So, the only reasonable thing for jOOQ to do is to set all Record.changed() flags to true.

Of course, you can manually set them to false again to work around this limitation, or perhaps don’t use the intermediary POJO data structure, but load the JSON data directly into the jOOQ Record.

A Frequent Question: Does jOOQ Have a First Level Cache?

One of the more frequent questions people have when switching from JPA to jOOQ is how to migrate from using JPA’s first level cache?

There are two important things to notice here:

jOOQ is mainly used for what JPA folks call “projections”

If you’re using only JPA in your application, you may have gotten used to occasionally fetch DTOs through “projections”. The term “projection” in this context stems from relational algebra, where a projection is simply a SELECT clause in your SQL statement.

Projections are useful when you know that the result of a query will only used for further data processing, but you’re not going to store any modifications to the data back into the database. There are two advantages to this:

  1. You can project arbitrary expressions, including things that cannot be mapped to entities
  2. You can bypass most of the entity management logic, including first and second level caches

When you’re doing this, you will be using SQL – mostly because JPQL (or HQL) are very limited in scope. Ideally, you would be using jOOQ as your projecting query will be type safe and vendor agnostic. You could even use jOOQ to only build the query and run by JPA, although if you’re not fetching entities, you’d lose all result type information that jOOQ would provide you with, otherwise.

So, the advantage of using jOOQ for projections (rather than JPA) is obvious. Sticking to JPA is mainly justified in case you only have very few projection use-cases and they’re also very simple.

jOOQ can also be used for basic CRUD

The question from the above tweet hints at the idea that SQL is not a very good language to implement basic CRUD. Or as I tend to say:

What I mean by this is that it’s really boring to manually express individual statements like these all the time:

INSERT INTO foo (a, b) VALUES (?, ?)
INSERT INTO bar (a, b, c) VALUES (?, ?, ?)
UPDATE baz SET x = ? WHERE id = ?

With most such CRUD operations, we’re simply inserting all the columns, or a given subset of columns, into a target table. Or we’re modifying all the changed columns in that table. These statements are always the same, but they break as soon as we add / remove columns, so we need to fix them throughout our application.

When you’re using an ORM like Hibernate, all you have to change is your annotated meta model, and the generated queries will adapt automatically throughout your application. That’s a huge win!

Additional features

Full-fledged ORMs like Hibernate come with tons of additional features, including:

  • A way to map relationships between entities
  • A way to cache entities in the client

Both of these features are very useful in more sophisticated CRUD use-cases, where an application desires to load, mutate, and persist a complex object graph with many involved entities.

Is this really needed?

However, in simple cases, it might be sufficient to load only 1-2 entities explicitly using jOOQ (jOOQ calls them UpdatableRecord), modify them, and store them back again into the database.

In such cases, it often doesn’t make sense to cache the entity in the client, nor to model the entity relationship in the client. Instead, we can write code like this:

// Fetch an author
AuthorRecord author : create.fetchOne(AUTHOR, AUTHOR.ID.eq(1));

// Create a new author, if it doesn't exist yet
if (author == null) {
    author = create.newRecord(AUTHOR);
    author.setId(1);
    author.setFirstName("Dan");
    author.setLastName("Brown");
}

// Mark the author as a "distinguished" author and store it
author.setDistinguished(1);

// Executes an update on existing authors, or insert on new ones
author.store();

Notice how we haven’t hand-written a single SQL statement. Instead, behind the scenes, jOOQ has generated the necessary INSERT or UPDATE statement for you.

If this is sufficient, you definitely don’t need JPA, and can use a more lightweight programming model through using jOOQ directly.

A few additional features are available, including:

Conclusion

The conclusion is, if you’ve found and read this article because you wanted to replace JPA’s first level cache while migrating to jOOQ is:

Re-think your migration

You don’t have to replace the entirety of JPA. If you need its more sophisticated features, by all means, keep using it along with jOOQ. However, if you don’t need its more sophisticated features and the above CRUD features in jOOQ are sufficient, let go of the idea of needing a first level cache and embrace moving more logic into your SQL queries.

jOOQ 3.11 Released With 4 New Databases, Implicit Joins, Diagnostics, and Much More

Today, jOOQ 3.11 has been released with support for 4 new databases, implicit joins, diagnostics, and much more

New Databases Supported

At last, 4 new SQL dialects have been added to jOOQ! These are:

jOOQ Professional Edition

  • Aurora MySQL Edition
  • Aurora PostgreSQL Edition
  • Azure SQL Data Warehouse

jOOQ Enterprise Edition

  • Teradata

Implicit Joins

One of the really cool features in ORMs like Hibernate, Doctrine, and others, is
the capability of using a relationship graph notation to access another entity’s
columns through what is often called “implicit joins”.

Instead of explicitly joining a to-one relationship to access its columns:

SELECT author.first_name, author.last_name, book.title
FROM book
JOIN author ON book.author_id = author.id

We would like to be able to access those columns directly, using this notation:

SELECT book.author.first_name, book.author.last_name, book.title
FROM book

The join is implied and should be added implicitly. jOOQ now allows for this to
happen when you use the code generator:

ctx.select(BOOK.author().FIRST_NAME, BOOK.author().LAST_NAME, BOOK.TITLE)
   .from(BOOK)
   .fetch();

When rendering this query, the implicit join graph will be calculated on the fly
and added behind the scenes to the BOOK table. This works for queries of
arbitrary complexity and on any level of nested SELECT.

More details in this blog post:
https://blog.jooq.org/2018/02/20/type-safe-implicit-join-through-path-navigation-in-jooq-3-11/

DiagnosticsListener SPI

A new DiagnosticsListener SPI has been added to jOOQ:
https://github.com/jOOQ/jOOQ/issues/5960

The purpose of this SPI is to sanitise your SQL language, JDBC and jOOQ API
usage. Listeners can listen to events such as:

  • duplicateStatements (similar SQL is executed, bind variables should be used)
  • repeatedStatements (identical SQL is executed, should be batched or rewritten)
  • tooManyColumnsFetched (not all projected columns were needed)
  • tooManyRowsFetched (not all fetched rows were needed)

The great thing about this SPI is that it can be exposed to clients through the
JDBC API, in case of which the diagnostics feature can reverse engineer your
JDBC or even JPA generated SQL. Ever wanted to detect N+1 queries from
Hibernate? Pass those Hibernate-generated queries through this SPI.

Want to find missing bind variables leading to cursor cache contention or SQLi?
Let jOOQ find similar SQL statements and report them. E.g.

  • SELECT name FROM person WHERE id = 1
  • SELECT name FROM person WHERE id = 2

Or also:

  • SELECT name FROM person WHERE id IN (?, ?)
  • SELECT name FROM person WHERE id IN (?, ?, ?)

Anonymous blocks

Many databases support anonymous blocks to run several statements in a single
block scope. For example, Oracle:

DECLARE
  l_var NUMBER(10);
BEGIN
  l_var := 10;
  dbms_output.put_line(l_var);
END;

jOOQ now supports the new org.jooq.Block API to allow for wrapping DDL and DML
statements in such a block. This is a first step towards a future jOOQ providing
support for:

  • Abstractions over procedural languages
  • CREATE PROCEDURE and CREATE FUNCTION statements
  • Trigger support
  • And much more

Parser

jOOQ’s parser support is an ongoing effort. This release has added support for
a lot of new SQL clauses and functions from various vendors and in various DDL
and DML statements.

The parser is now also exposed through a public website and API, where SQL can
be translated from one dialect to another:
https://www.jooq.org/translate

This website will help further drive jOOQ API development by helping to find
missing functionality that is used in real-world SQL.

Another way to access this API is through the new org.jooq.ParserCLI command
line tool. For example, run:

$ java -cp jooq-3.11.0.jar org.jooq.ParserCLI -f -t ORACLE -s "SELECT * FROM (VALUES(1),(2)) AS t(a)"

To get:

select *
from (
  (
    select null a
    from dual
    where 1 = 0
  )
  union all (
    select *
    from (
      (
        select 1
        from dual
      )
      union all (
        select 2
        from dual
      )
    ) t
  )
) t;

Formal Java 10 Support

jOOQ 3.11 is the first release that is formally integration tested with Java 10.
To use jOOQ with Java 10, use the Java 8 distribution which has not yet been
modularised, but contains Automatic-Module-Name specification to be forward
compatible with future, modularised jOOQ distributions.

Additionally, package names between jOOQ, jOOQ-meta, and jOOQ-codegen have been
cleaned up to prevent duplicate package names, and the JAXB dependency has been
added explicitly to the various artefacts.

Other great improvements

  • Finally, asterisks (SELECT * or SELECT t.*) are formally supported in the API.
  • Collations can now be specified on a variety of syntax elements
  • The org.jooq.Comment type has been added, and DDL statements for it
  • The DefaultBinding implementation has been rewritten for better peformance
  • Several performance improvements in jOOQ’s internals
  • Many more DDL statements are supported including GRANT and REVOKE
  • Support for the EXPLAIN statement
  • FETCH n PERCENT ROWS and TOP n PERCENT clauses are supported
  • Better org.jooq.Name and org.jooq.Named API for identifier handling
  • Support for PostgreSQL 10
  • Support for SQL Server 2017
  • Support for DB2 11
  • Upgraded MariaDB support for window functions, inv dist functions, WITH
  • jOOU dependency updated to 0.9.3
  • jOOR dependency updated to 0.9.8
  • Server output (e.g. DBMS_OUTPUT) can now be fetched automatically, by jOOQ
  • Code generation support for PL/SQL TABLE types
  • SQL Keywords Can Now Be Rendered In Pascal Style If You Must
  • Emulate PostgreSQL’s ON CONFLICT clause using MERGE

The complete list can be seen here:
https://www.jooq.org/notes/?version=3.11

Truth First, or Why You Should Mostly Implement Database First Designs

In this much overdue article, I will explain why I think that in almost all cases, you should implement a “database first” design in your application’s data models, rather than a “Java first” design (or whatever your client language is), the latter approach leading to a long road of pain and suffering, once your project grows.

This article is inspired by a recent Stack Overflow question.

Interesting reddit discussions on /r/java and /r/programming.

Code generation

To my surprise, a small group of first time jOOQ users seem to be appalled by the fact that jOOQ heavily relies on source code generation. No one keeps you from using jOOQ the way you want and you don’t have to use code generation, but the default way to use jOOQ according to the manual is to start with a (legacy) database schema, reverse engineer that using jOOQ’s code generator to get a bunch of classes representing your tables, and then to write type safe queries against those tables:

for (Record2<String, String> record : DSL.using(configuration)
//   ^^^^^^^^^^^^^^^^^^^^^^^ Type information derived from the 
//   generated code referenced from the below SELECT clause

       .select(ACTOR.FIRST_NAME, ACTOR.LAST_NAME)
//           vvvvv ^^^^^^^^^^^^  ^^^^^^^^^^^^^^^ Generated names
       .from(ACTOR)
       .orderBy(1, 2)) {
    // ...
}

The code is generated either manually outside of the build, or automatically with every build. For instance, such a re-generation could follow immediately after a Flyway database migration, which can also be run either manually or automatically.

Source code generation

There are different philosophies, advantages, and disadvantages regarding these manual/automatic approaches, which I don’t want to discuss in this article. But essentially, the point of generated code is that it provides a Java representation of something that we take for granted (a “truth”) either within or outside of our system. In a way, compilers do the same thing when they generate byte code, machine code, or some other type of source code from the original sources – we get a representation of our “truth” in a different language, for whatever reason.

There are many such code generators out there. For instance, XJC can generate Java code from XSD or WSDL files. The principle is always the same:

  • There is some truth (internal or external), like a specification, data model, etc.
  • We need a local representation of that truth in our programming language

And it almost always makes sense to generate that latter, to avoid redundancy.

Type providers and annotation processing

Noteworthy: Another, more modern approach to jOOQ’s particular code generation use-case would be Type Providers, as implemented by F#, in case of which the code is generated by the compiler while compiling. It never really exists in its source form. A similar (but less sophisticated) tool in Java are annotation processors, e.g. Lombok.

In a way, this does the same thing except:

  • You don’t see the generated code (perhaps that’s less appalling to some?)
  • You must ensure the types can be provided, i.e. the “truth” must always be available. Easy in the case of Lombok, which annotates the “truth”. A bit more difficult with database models, which rely on an always available live connection.

What’s the problem with code generation?

Apart from the tricky question whether to trigger code generation manually or automatically, some people seem to think that code must not be generated at all. The reason I hear the most is the idea that it is difficult to set up in a build pipeline. And yes, that is true. There is extra infrastructure overhead. Especially if you’re new to a certain product (like jOOQ, or JAXB, or Hibernate, etc.), setting up an environment takes time you would rather spend learning the API itself and getting value out of it.

If the overhead of learning how the code generator works is too high, then indeed, the API failed to make the code generator easy to use (and later on, to customise). That should be a high priority for any such API. But that’s the only argument against code generation. Other than that, it makes absolutely no sense at all to hand-write the local representation of the internal or external truth.

Many people argue that they don’t have time for that stuff. They need to ship their MVPs. They can finalise their build pipelines later. I say:

“But Hibernate / JPA makes coding Java first easy”

Yes that’s true. And it’s both a bliss and a curse for Hibernate and its users. In Hibernate, you can just write a couple of entities, such as:

@Entity
class Book {
  @Id
  int id;
  String title;
}

And you’re almost set. Let Hibernate generate the boring “details” of how to define this entity in your SQL dialect’s DDL:

CREATE TABLE book (
  id INTEGER PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
  title VARCHAR(50),

  CONSTRAINT pk_book PRIMARY KEY (id)
);

CREATE INDEX i_book_title ON book (title);

… and start running the application. That’s really cool to get started quickly and to try out things.

But, huh, wait. I cheated.

  • Will Hibernate really apply that named primary key definition?
  • Will it create the index on TITLE, which I know we’ll need?
  • Will it add an identity specification?

Probably not. While you’re developing your greenfield project, it is convenient to always throw away your entire database and re-generate it from scratch, once you’ve added the additional annotations. So, the Book entity would eventually look like this:

@Entity
@Table(name = "book", indexes = {
  @Index(name = "i_book_title", columnList = "title")
})
class Book {
  @Id
  @GeneratedValue(strategy = IDENTITY)
  int id;
  String title;
}

Cool. Re-generate. Again, this makes it really easy to get started.

But you’ll pay the price later on

At some point, you go to production. And that’s when this model no longer works. Because

Once you go live, you can no longer throw away your database, as your database has become legacy.

From now on, you have to write DDL migration scripts, e.g. using Flyway. And then, what happens to your entities? You can either adapt them manually (so you double the work), or have Hibernate re-generate them for you (how big are your chances of the generation matching your expectations?) You can only lose.

Because once you go to production, you need hotfixes. And those have to go live fast. And since you didn’t prepare for pipelining your migrations to production smoothly, you’ll patch things wildly. And then you run out of time to do it right™. And you’ll blame Hibernate, because it’s always someone else’s fault…

Instead, you could have done things entirely differently from the beginning. Like using those round wheels.

Go Database First

The real “truth” of your database schema, and the “sovereignty” over it, resides with your database. The database is the only place where the schema is defined, and all clients have a copy of the database schema, not vice versa. The data is in your database, not in your client, so it makes perfect sense to enforce the schema and its integrity in the database, right where the data is.

This is old wisdom, nothing new. Primary and unique keys are good. Foreign keys are good. Check constraints are good. Assertions (when they’re finally implemented) are good.

And that’s not where it ends. For instance, if you’re using Oracle, you may want to specify:

  • In what tablespace your table resides
  • What PCTFREE value it has
  • What the cache size of your sequence (behind the identity) is

Maybe, all of this doesn’t matter in small systems, but you don’t have to go “big data” before you can profit from vendor-specific storage optimisations as the above. None of the ORMs I’ve ever seen (including jOOQ) will allow you to use the full set of DDL options that you may want to use on your database. ORMs offer some tools to help you write DDL.

But ultimately, a well-designed schema is hand written in DDL. All generated DDL is only an approximation of that.

What about the client model?

As mentioned before, you will need a copy of your database schema in your client, a client representation. Needless to say that this client representation needs to be in-sync with the real model. How to best do that? By using a code generator.

All databases expose their meta information through SQL. Here’s how to get all tables from your database in various SQL dialects:

-- H2, HSQLDB, MySQL, PostgreSQL, SQL Server
SELECT table_schema, table_name
FROM information_schema.tables

-- DB2
SELECT tabschema, tabname
FROM syscat.tables

-- Oracle
SELECT owner, table_name
FROM all_tables

-- SQLite
SELECT name
FROM sqlite_master

-- Teradata
SELECT databasename, tablename
FROM dbc.tables

These queries (or similar ones, e.g. depending on whether views, materialised views, table valued functions should also be considered) are also run by JDBC’s DatabaseMetaData.getTables() call, or by the jOOQ-meta module.

From the result of such queries, it’s relatively easy to generate any client representation of your database model, regardless what your client technology is.

  • If you’re using JDBC or Spring, you can create a bunch of String constants
  • If you’re using JPA, you can generate the entities themselves
  • If you’re using jOOQ, you can generate the jOOQ meta model

Depending on the amount of features your client API offers (e.g. jOOQ or JPA), the generated meta model can be really rich and complete. Consider, for instance, jOOQ 3.11’s implicit join feature, which relies on generated meta information about the foreign key relationships between your tables.

Now, any database increment will automatically lead to updated client code. For instance, imagine:

ALTER TABLE book RENAME COLUMN title TO book_title;

Would you really want to do this work twice? No way. Just commit the DDL, run it through your build pipeline, and have an updated entity:

@Entity
@Table(name = "book", indexes = {

  // Would you have thought of this?
  @Index(name = "i_book_title", columnList = "book_title")
})
class Book {
  @Id
  @GeneratedValue(strategy = IDENTITY)
  int id;

  @Column("book_title")
  String bookTitle;
}

Or an updated jOOQ class. Plus: Your client code might no longer compile, which can be a good thing! Most DDL changes are also semantic changes, not just syntactic ones. So, it’s great to be able to see in compiled client source code, what code is (or may be) affected by your database increment.

A single truth

Regardless what technology you’re using, there’s always one model that contains the single truth for a subsystem – or at least, we should aim for this goal and avoid the enterprisey mess where “truth” is everywhere and nowhere. It just makes everything much simpler. If you exchange XML files with some other system, you’re going to use XSD. Like jOOQ’s INFORMATION_SCHEMA meta model in XML form:
https://www.jooq.org/xsd/jooq-meta-3.10.0.xsd

  • XSD is well understood
  • XSD specifies XML content very well, and allows for validation in all client languages
  • XSD can be versioned easily, and evolved backwards compatibly
  • XSD can be translated to Java code using XJC

The last bullet is important. When communicating with an external system through XML messages, we want to be sure our messages are valid. That’s really really easy to do with JAXB, XJC, and XSD. It would be outright nuts to think that a Java-first approach where we design our messages as Java objects could somehow be reasonably mapped to XML for someone else to consume. That generated XML would be of very poor quality, undocumented, and hard to evolve. If there’s an SLA on such an interface, we’d be screwed.

Frankly, that’s what happens to JSON APIs all the time, but that’s another story, another rant…

Databases: Same thing

When you’re using databases, it’s the same thing. The database owns its data and it should be the master of the schema. All modifications to the schema should be implemented using DDL directly, to update the single truth.

Once that truth is updated, all clients need to update their copies of the model as well. Some clients may be written in Java, using either (or both of) jOOQ and Hibernate, or JDBC. Other clients may be written in Perl (good luck to them). Even other clients may be written in C#. It doesn’t matter. The main model is in the database. ORM-generated models are of poor quality, not well documented, and hard to evolve.

So, don’t do it. And, don’t do it from the very beginning. Instead, go database first. Build a deployment pipeline that can be automated. Include code generators to copy your database model back into the clients. And stop worrying about code generation. It’s a good thing. You’ll be productive. All it takes is a bit of initial effort to set it up, and you’ll get years of improved productivity for the rest of your project.

Thank me later.

Clarification

Just to be sure: This article in no way asserts that your database model should be imposed on your entire system (e.g. your domain, your business logic, etc. etc.). The claim I made here is that client code interacting with the database should act upon the database model, and not have its own first class model of the database instead. This logic typically resides in the data access layer of your client.

In 2-tier architectures, which still have their place sometimes, that may be the only model of your system. In most systems, however, I consider the data access layer a “subsystem” that encapsulates the database model. So, there.

Exceptions

There are always exceptions, and I promised that the database first and code generation approach may not always be the right choice. These exceptions are (probably not exhaustive):

  • When the schema is unknown and must be discovered. E.g. you’re a tool vendor helping users navigate any schema. Duh… No code generation. But still database first.
  • When the schema needs to be generated on the fly for some task. This sounds a lot like a more or less sophisticated version of the entity attribute value pattern, i.e. you don’t really have a well-defined schema. In that case, it’s often not even sure if an RDBMS will be the right choice.

The nature of exceptions is that they’re exceptional. In the majority of RDBMS usage, the schema is known in advance, placed inside the RDBMS as the single source of “truth”, and clients will have derived copies from it – ideally generated using a code generator.