Truth First, or Why You Should Mostly Implement Database First Designs

In this much overdue article, I will explain why I think that in almost all cases, you should implement a “database first” design in your application’s data models, rather than a “Java first” design (or whatever your client language is), the latter approach leading to a long road of pain and suffering, once your project grows.

This article is inspired by a recent Stack Overflow question.

Interesting reddit discussions on /r/java and /r/programming.

Code generation

To my surprise, a small group of first time jOOQ users seem to be appalled by the fact that jOOQ heavily relies on source code generation. No one keeps you from using jOOQ the way you want and you don’t have to use code generation, but the default way to use jOOQ according to the manual is to start with a (legacy) database schema, reverse engineer that using jOOQ’s code generator to get a bunch of classes representing your tables, and then to write type safe queries against those tables:

for (Record2<String, String> record : DSL.using(configuration)
//   ^^^^^^^^^^^^^^^^^^^^^^^ Type information derived from the 
//   generated code referenced from the below SELECT clause

       .select(ACTOR.FIRST_NAME, ACTOR.LAST_NAME)
//           vvvvv ^^^^^^^^^^^^  ^^^^^^^^^^^^^^^ Generated names
       .from(ACTOR)
       .orderBy(1, 2)) {
    // ...
}

The code is generated either manually outside of the build, or automatically with every build. For instance, such a re-generation could follow immediately after a Flyway database migration, which can also be run either manually or automatically.

Source code generation

There are different philosophies, advantages, and disadvantages regarding these manual/automatic approaches, which I don’t want to discuss in this article. But essentially, the point of generated code is that it provides a Java representation of something that we take for granted (a “truth”) either within or outside of our system. In a way, compilers do the same thing when they generate byte code, machine code, or some other type of source code from the original sources – we get a representation of our “truth” in a different language, for whatever reason.

There are many such code generators out there. For instance, XJC can generate Java code from XSD or WSDL files. The principle is always the same:

  • There is some truth (internal or external), like a specification, data model, etc.
  • We need a local representation of that truth in our programming language

And it almost always makes sense to generate that latter, to avoid redundancy.

Type providers and annotation processing

Noteworthy: Another, more modern approach to jOOQ’s particular code generation use-case would be Type Providers, as implemented by F#, in case of which the code is generated by the compiler while compiling. It never really exists in its source form. A similar (but less sophisticated) tool in Java are annotation processors, e.g. Lombok.

In a way, this does the same thing except:

  • You don’t see the generated code (perhaps that’s less appalling to some?)
  • You must ensure the types can be provided, i.e. the “truth” must always be available. Easy in the case of Lombok, which annotates the “truth”. A bit more difficult with database models, which rely on an always available live connection.

What’s the problem with code generation?

Apart from the tricky question whether to trigger code generation manually or automatically, some people seem to think that code must not be generated at all. The reason I hear the most is the idea that it is difficult to set up in a build pipeline. And yes, that is true. There is extra infrastructure overhead. Especially if you’re new to a certain product (like jOOQ, or JAXB, or Hibernate, etc.), setting up an environment takes time you would rather spend learning the API itself and getting value out of it.

If the overhead of learning how the code generator works is too high, then indeed, the API failed to make the code generator easy to use (and later on, to customise). That should be a high priority for any such API. But that’s the only argument against code generation. Other than that, it makes absolutely no sense at all to hand-write the local representation of the internal or external truth.

Many people argue that they don’t have time for that stuff. They need to ship their MVPs. They can finalise their build pipelines later. I say:

“But Hibernate / JPA makes coding Java first easy”

Yes that’s true. And it’s both a bliss and a curse for Hibernate and its users. In Hibernate, you can just write a couple of entities, such as:

@Entity
class Book {
  @Id
  int id;
  String title;
}

And you’re almost set. Let Hibernate generate the boring “details” of how to define this entity in your SQL dialect’s DDL:

CREATE TABLE book (
  id INTEGER PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
  title VARCHAR(50),

  CONSTRAINT pk_book PRIMARY KEY (id)
);

CREATE INDEX i_book_title ON book (title);

… and start running the application. That’s really cool to get started quickly and to try out things.

But, huh, wait. I cheated.

  • Will Hibernate really apply that named primary key definition?
  • Will it create the index on TITLE, which I know we’ll need?
  • Will it add an identity specification?

Probably not. While you’re developing your greenfield project, it is convenient to always throw away your entire database and re-generate it from scratch, once you’ve added the additional annotations. So, the Book entity would eventually look like this:

@Entity
@Table(name = "book", indexes = {
  @Index(name = "i_book_title", columnList = "title")
})
class Book {
  @Id
  @GeneratedValue(strategy = IDENTITY)
  int id;
  String title;
}

Cool. Re-generate. Again, this makes it really easy to get started.

But you’ll pay the price later on

At some point, you go to production. And that’s when this model no longer works. Because

Once you go live, you can no longer throw away your database, as your database has become legacy.

From now on, you have to write DDL migration scripts, e.g. using Flyway. And then, what happens to your entities? You can either adapt them manually (so you double the work), or have Hibernate re-generate them for you (how big are your chances of the generation matching your expectations?) You can only lose.

Because once you go to production, you need hotfixes. And those have to go live fast. And since you didn’t prepare for pipelining your migrations to production smoothly, you’ll patch things wildly. And then you run out of time to do it right™. And you’ll blame Hibernate, because it’s always someone else’s fault…

Instead, you could have done things entirely differently from the beginning. Like using those round wheels.

Go Database First

The real “truth” of your database schema, and the “sovereignty” over it, resides with your database. The database is the only place where the schema is defined, and all clients have a copy of the database schema, not vice versa. The data is in your database, not in your client, so it makes perfect sense to enforce the schema and its integrity in the database, right where the data is.

This is old wisdom, nothing new. Primary and unique keys are good. Foreign keys are good. Check constraints are good. Assertions (when they’re finally implemented) are good.

And that’s not where it ends. For instance, if you’re using Oracle, you may want to specify:

  • In what tablespace your table resides
  • What PCTFREE value it has
  • What the cache size of your sequence (behind the identity) is

Maybe, all of this doesn’t matter in small systems, but you don’t have to go “big data” before you can profit from vendor-specific storage optimisations as the above. None of the ORMs I’ve ever seen (including jOOQ) will allow you to use the full set of DDL options that you may want to use on your database. ORMs offer some tools to help you write DDL.

But ultimately, a well-designed schema is hand written in DDL. All generated DDL is only an approximation of that.

What about the client model?

As mentioned before, you will need a copy of your database schema in your client, a client representation. Needless to say that this client representation needs to be in-sync with the real model. How to best do that? By using a code generator.

All databases expose their meta information through SQL. Here’s how to get all tables from your database in various SQL dialects:

-- H2, HSQLDB, MySQL, PostgreSQL, SQL Server
SELECT table_schema, table_name
FROM information_schema.tables

-- DB2
SELECT tabschema, tabname
FROM syscat.tables

-- Oracle
SELECT owner, table_name
FROM all_tables

-- SQLite
SELECT name
FROM sqlite_master

-- Teradata
SELECT databasename, tablename
FROM dbc.tables

These queries (or similar ones, e.g. depending on whether views, materialised views, table valued functions should also be considered) are also run by JDBC’s DatabaseMetaData.getTables() call, or by the jOOQ-meta module.

From the result of such queries, it’s relatively easy to generate any client representation of your database model, regardless what your client technology is.

  • If you’re using JDBC or Spring, you can create a bunch of String constants
  • If you’re using JPA, you can generate the entities themselves
  • If you’re using jOOQ, you can generate the jOOQ meta model

Depending on the amount of features your client API offers (e.g. jOOQ or JPA), the generated meta model can be really rich and complete. Consider, for instance, jOOQ 3.11’s implicit join feature, which relies on generated meta information about the foreign key relationships between your tables.

Now, any database increment will automatically lead to updated client code. For instance, imagine:

ALTER TABLE book RENAME COLUMN title TO book_title;

Would you really want to do this work twice? No way. Just commit the DDL, run it through your build pipeline, and have an updated entity:

@Entity
@Table(name = "book", indexes = {

  // Would you have thought of this?
  @Index(name = "i_book_title", columnList = "book_title")
})
class Book {
  @Id
  @GeneratedValue(strategy = IDENTITY)
  int id;

  @Column("book_title")
  String bookTitle;
}

Or an updated jOOQ class. Plus: Your client code might no longer compile, which can be a good thing! Most DDL changes are also semantic changes, not just syntactic ones. So, it’s great to be able to see in compiled client source code, what code is (or may be) affected by your database increment.

A single truth

Regardless what technology you’re using, there’s always one model that contains the single truth for a subsystem – or at least, we should aim for this goal and avoid the enterprisey mess where “truth” is everywhere and nowhere. It just makes everything much simpler. If you exchange XML files with some other system, you’re going to use XSD. Like jOOQ’s INFORMATION_SCHEMA meta model in XML form:
https://www.jooq.org/xsd/jooq-meta-3.10.0.xsd

  • XSD is well understood
  • XSD specifies XML content very well, and allows for validation in all client languages
  • XSD can be versioned easily, and evolved backwards compatibly
  • XSD can be translated to Java code using XJC

The last bullet is important. When communicating with an external system through XML messages, we want to be sure our messages are valid. That’s really really easy to do with JAXB, XJC, and XSD. It would be outright nuts to think that a Java-first approach where we design our messages as Java objects could somehow be reasonably mapped to XML for someone else to consume. That generated XML would be of very poor quality, undocumented, and hard to evolve. If there’s an SLA on such an interface, we’d be screwed.

Frankly, that’s what happens to JSON APIs all the time, but that’s another story, another rant…

Databases: Same thing

When you’re using databases, it’s the same thing. The database owns its data and it should be the master of the schema. All modifications to the schema should be implemented using DDL directly, to update the single truth.

Once that truth is updated, all clients need to update their copies of the model as well. Some clients may be written in Java, using either (or both of) jOOQ and Hibernate, or JDBC. Other clients may be written in Perl (good luck to them). Even other clients may be written in C#. It doesn’t matter. The main model is in the database. ORM-generated models are of poor quality, not well documented, and hard to evolve.

So, don’t do it. And, don’t do it from the very beginning. Instead, go database first. Build a deployment pipeline that can be automated. Include code generators to copy your database model back into the clients. And stop worrying about code generation. It’s a good thing. You’ll be productive. All it takes is a bit of initial effort to set it up, and you’ll get years of improved productivity for the rest of your project.

Thank me later.

Clarification

Just to be sure: This article in no way asserts that your database model should be imposed on your entire system (e.g. your domain, your business logic, etc. etc.). The claim I made here is that client code interacting with the database should act upon the database model, and not have its own first class model of the database instead. This logic typically resides in the data access layer of your client.

In 2-tier architectures, which still have their place sometimes, that may be the only model of your system. In most systems, however, I consider the data access layer a “subsystem” that encapsulates the database model. So, there.

Exceptions

There are always exceptions, and I promised that the database first and code generation approach may not always be the right choice. These exceptions are (probably not exhaustive):

  • When the schema is unknown and must be discovered. E.g. you’re a tool vendor helping users navigate any schema. Duh… No code generation. But still database first.
  • When the schema needs to be generated on the fly for some task. This sounds a lot like a more or less sophisticated version of the entity attribute value pattern, i.e. you don’t really have a well-defined schema. In that case, it’s often not even sure if an RDBMS will be the right choice.

The nature of exceptions is that they’re exceptional. In the majority of RDBMS usage, the schema is known in advance, placed inside the RDBMS as the single source of “truth”, and clients will have derived copies from it – ideally generated using a code generator.

SQL Server ALTER TABLE SET DEFAULT

Most databases that support default values on their column DDL, it is also possible to actually alter that default. An Oracle example:

CREATE TABLE t (
  val NUMBER(7) DEFAULT 1 NOT NULL
);

-- Oops, wrong default, let us change it
ALTER TABLE t MODIFY val DEFAULT -1;

-- Now that is better

Unfortunately, this isn’t possible in SQL Server, where the DEFAULT column property is really a constraint, and probably a constraint whose name you don’t know because it was system generated.

But luckily, jOOQ 3.4 now supports DDL and can abstract this information away from you by generating the following Transact-SQL program:

DECLARE @constraint NVARCHAR(max);
DECLARE @command NVARCHAR(max);

SELECT @constraint = name
FROM sys.default_constraints
WHERE parent_object_id = object_id('t')
AND parent_column_id = columnproperty(
    object_id('t'), 'val', 'ColumnId');

IF @constraint IS NOT NULL
BEGIN
  SET @command = 'ALTER TABLE t DROP CONSTRAINT ' 
    + @constraint;
  EXECUTE sp_executeSQL @command

  SET @command = 'ALTER TABLE t ADD CONSTRAINT ' 
    + @constraint + ' DEFAULT -1 FOR val';
  EXECUTE sp_executeSQL @command
END
ELSE
BEGIN
  SET @command = 'ALTER TABLE t ADD DEFAULT -1 FOR val';
  EXECUTE sp_executeSQL @command
END

This program will either drop and create a new constraint with the same name, or create an entirely new constraint with a system-generated name.

With jOOQ, you can execute this statement as such:

DSL.using(configuration)
   .alterTable(T)
   .alter(T.VAL)
   .defaultValue(-1)
   .execute();

jOOQ Newsletter: February 26, 2014

Subscribe to the newsletter here

Tweet of the Day

Our followers, users and customers are shouting their love for jOOQ to the world. Here are:

Ben Hood who聽is constantly discovering new useful features in jOOQ.

Antoine Comte who聽is skipping MyBatis to jump directly to jOOQ (in French)

Both guys are absolutely right, of course!

jOOQ 3.4 Outlook

jOOQ 3.3 was an exciting release. But now we’re heading towards the next upgrade, and it’s not going to be any less exciting. Here are the top interesting topics that might be in scope for jOOQ 3.4:

  • IBM Informix support for the jOOQ Professional Edition.
  • Typesafe DDL support for the most common statement types.
  • Transaction management. We’ve been discussing this a lot of times and we would love to provide some聽jOOQ-style Java-8 ready default behaviour. As always, with plenty of options to override the defaults.
  • TABLE types and TABLE-valued functions in all databases. This is already very useful in jOOQ 3.3’s SQL Server integration.
  • SQL 2 jOOQ聽improvements and a UI to transform SQL to jOOQ code.

As you can see, plenty of goodies coming up! Missing a feature?聽Write us!

Community work

We’re thrilled to have spotted yet another great article by聽Vlad Mihalcea, who is discovering the merits of using jOOQ for interaction with stored procedures and functions. If you’re doing a deep SQL integration with your favourite database of choice, stored procedures are a very useful tool in your tool chain. Unfortunately, neither JDBC nor JPA offer any convenience in that area, in a way as jOOQ does.Read the full article here.

Besides, we’re very happy to have found that our friends at聽RebelLabs聽are excited with our blog posts, such that they have featured a guest post of ours on their blog. RebelLabs have a tremendous amount of great content from many writers, so we’re very honoured to聽present our guest post聽to you.

SQL Zone – 60% of SQL Developers Fail

One of the most important topics when writing SQL is performance, as the database is the bottleneck in most applications. And surprisingly, 60% of all SQL developers will fail this little test performed by the聽Use The Index, Luke聽guys.

This blog post is a must-read聽for all juniors and seniors alike. You never stop learning about correct SQL indexing.

SQL Zone – JDBC and Booleans

Boolean types have been introduced only late in the SQL standard, namely inSQL:1999. Even today, not all databases natively support BOOLEAN or BIT types. Most importantly, we can still wait for them in Oracle for a while.

What does this mean for Java developers striving for a maximum compatibility through JDBC?聽Read our findings in this blog post here.

Upcoming Events

February has been a bit quiet as we were preparing for the jOOQ 3.3 release, but soon you’ll get another chance at hearing about jOOQ or SQL in general in any of these upcoming events:

With our new SQL-talk, we’ll no longer just spread some jOOQ love, but also some SQL love in general. We believe that SQL deserves more presence in today’s software engineering talks, and who would be better to talk about SQL than us? Are you interested in hosting this talk at your company?聽Contact us!

Stay informed about 2014 events on聽www.jooq.org/news.

Liquibase for DB Migrations

I have just now discovered a very nice-looking tool for database migrations: Liquibase

http://www.liquibase.org/

With Liquibase, you can model your DB increments as XML files that will translate to as many as 13 different databases. A sample DB increment (taken from the Liquibase manual):

<!--
  ALTER TABLE PERSON MODIFY COLUMN firstname VARCHAR(5000);
  -->
<modifyColumn tableName="person">
    <column name="firstname" type="varchar(5000)"/>
</modifyColumn>

<!--
  ALTER TABLE ADDRESS ADD CONSTRAINT fk_address_person
  FOREIGN KEY (person_id) REFERENCES person (id);
  -->
<addForeignKeyConstraint constraintName="fk_address_person"
    baseTableName="address" baseColumnNames="person_id"
    referencedTableName="person" referencedColumnNames="id"
/>

<!--
  UPDATE ProductSettings SET property = 'vatCategory'
  WHERE property = 'vat';
  -->
<update tableName="ProductSettings">
    <column name="property" value="vatCategory"/>
    <where>property='vat'</where>
</update>

…and so on. I guess it’s about time to contact the Liquibase folks and ask for cooperation! A fully integrated solution involving database schema management, database schema migration, and jOOQ’s source code generation seem to be the perfect toolset for a Java database developer