Dogfooding in Product Development

Dogfooding, or eating your own dog food, is a practice that all product developers should implement all the time. According to wikipedia:

Dogfooding, occurs when an organization uses its own product. This can be a way for an organization to test its products in real-world usage. Hence dogfooding can act as quality control, and eventually a kind of testimonial advertising. Once in the market, dogfooding demonstrates confidence in the developers’ own products

I’ve recently started delivering this talk about API design at conferences, where I mentioned dogfooding as an excellent approach to make sure the user experience of your API is great:

The more you use your own API, the better it gets from a UX and usability perspective.

Dogfooding via tests

We do a lot of dogfooding ourselves, inevitably, as we write tons and tons of tests for jOOQ, to make sure jOOQ works correctly on all the currently 26 RDBMS that we support.

Writing a test for new API means we have to immediately use the new API for the first time. This helps discover the first usability problems. But there are even better ways:

Dogfooding via significant new functionality

Another great way to dogfood is to create significant new features. For example, this new schema diff tool that jOOQ 3.13 will ship with (#9425). It is part of a bigger project that we call DDL interpretation, where we start implementing the DDL part of a database by maintaining an up to date database schema depending on a stream of DDL statements. This will be part of a variety of improvements and value propositions in the area of database change management / SQL migrations. Recently, we’ve published a post about an improved Liquibase integration, which goes in a similar direction.

The schema diff tool is something many database products are offering. For us, it is relatively simple to implement as we:

  • Support a variety of schema meta representations
  • Support a lot of DDL syntax
  • Support exporting schema meta representations as DDL

We’ve always had a type called org.jooq.Meta. That type represents your database. Historically, it just gave jOOQ-style access to JDBC’s java.sql.DatabaseMetaData. But over time, we also started supporting exposing generated jOOQ code, or XML files as org.jooq.Meta.

With jOOQ 3.13, we’ll support interpreting arbitrary DDL (parsed or manually created using the jOOQ API), or also Liquibase XML files to create such a org.jooq.Meta representation. Which can then be turned again into DDL using the new Meta.ddl() method:

System.out.println(
  ctx.meta("")
     .apply("create table t (i int)")
     .apply("alter table t add j int")
     .apply("alter table t alter i set not null")
     .apply("alter table t add primary key (i)")
);

When we run the above program, we’re getting the following output:

create table t(
  i int not null,
  j int null,
  primary key (i)
);

That’s already cool – we can create a snapshot of our schema at any point of a set of database migration scripts. Can we also do the inverse? We can, with the new diff tool:

System.out.println(
  ctx.meta(
    "create table t (i int)"
  ).diff(ctx.meta(
    "create table t ("
  + "i int not null, "
  + "j int null, "
  + "primary key (i))"
  ))
);

The output generated from this is:

alter table t alter i set not null;
alter table t add j integer null;
alter table t add constraint primary key (i);

This is part of an exciting new set of DDL / migration features, we can hardly wait to publish with jOOQ 3.13 (around Q1 2020), as we believe it will greatly help with your existing database change management solution – perhaps as a Flyway or Liquibase plugin.

What does this have to do with dogfooding?

While developing this feature, we’ve discovered numerous missing features, which we also implemented for jOOQ 3.13:

  • #7752: Our current meta model for sequences does remember flags like MAXVALUE
  • #9428: Meta.toString() could just call Meta.ddl(), the DDL export. This is what I’ve shown above, very useful when debugging with small schemas!
  • #9433: Meta.equals() and Meta.hashCode() was not yet implemented, as this didn’t make sense, historically.
  • #9434: Our current DDL export should support reordering objects in alphabetical order for the export. This is useful for equality checks, text-based diffs, etc.
  • #9437: We don’t support ALTER SEQUENCE statements that allow for modifying sequence properties like MAXVALUE yet.
  • #9438: The current emulation of ALTER SEQUENCE .. RESTART hard codes restarting the sequence at 1, when in fact, it could be some other START WITH value
  • #9440: We’ll need a synthetic syntax to drop unnamed foreign keys.

This is just a short extract of things we’ve implemented this week based on our discoveries of what’s missing when implementing the “big” feature: The Meta.diff(Meta) method. The diff() method will be the one highlighted in the release notes of 3.13. But the little things above are what ultimately makes jOOQ so useful – the many little things that were discovered while dogfooding and that help everyone, not just the users that use the diff() method.

Dogfooding via blogging and documentation

But, perhaps even better than when implementing new features is blogging or documenting API. When blogging or documenting, the vendor / maintainer has to put themselves into the position of the user, in particular, the first time user of a specific API.

Imagine, after writing all this code that I as an author love, I have to reset my experience, and pretend I don’t know what to expect. Then, sit down, and write a really good and simple example using the API that I wrote.

The first example I came up with was not like this (as that API didn’t exist yet):

System.out.println(
  ctx.meta("")
     .apply("create table t (i int)")
     .apply("alter table t add j int")
     .apply("alter table t alter i set not null")
     .apply("alter table t add primary key (i)")
);

Instead, it was like this (which works the same way):

System.out.println(
  ctx.meta("create table t (i int);\n"
         + "alter table t add j int;\n"
         + "alter table t alter i set not null;\n"
         + "alter table t add primary key (i);")
);

I thought to myself, do I really need all that string concatenation? I already have this Meta.apply(Queries) method, which would add even more value. But then, my code would look like this:

System.out.println(
  ctx.meta("")
     .apply(queries(createTable("t").column("i", INTEGER)))
     .apply(queries(alterTable("t").add("i", INTEGER)))
     .apply(queries(alterTable("t").alter("i").setNotNull()))
     .apply(queries(alterTable("t").add(primaryKey("i")))
);

That would be cool, too. I could show that our DSL API and our parser can do the same things. But I didn’t want to show the DSL API in this blog post. It would only distract from my main point. I could, of course, parse the queries explicitly:

System.out.println(
  ctx.meta("")
     .apply(ctx.parse("create table t (i int)"))
     .apply(ctx.parse("alter table t add j int"))
     .apply(ctx.parse("alter table t alter i set not null"))
     .apply(ctx.parse("alter table t add primary key (i)"))
);

But why not just add convenience API that does this for me?

System.out.println(
  ctx.meta("")
     .apply("create table t (i int)")
     .apply("alter table t add j int")
     .apply("alter table t alter i set not null")
     .apply("alter table t add primary key (i)")
);

Where Meta.apply(String) looks like this:

public final Meta apply(String diff) {
  return apply(dsl().parser().parse(diff));
}

That’s mere convenience. It is not needed. But it is very useful:

  • For tests
  • For blog posts
  • For documentation
  • For new users
  • For simple applications

The more thorough, more powerful API that accepts Queries (which wraps a bunch of parsed or hand-constructed jOOQ Query) objects is the real feature here. But convenience helps the user very much!

I would not have discovered this requirement without dogfooding.

Convenience

A much underrated tweet by Brian Goetz is this one:

APIs need a ton of abstractions. The JDK Collector API is a very good example. In Java and in SQL, in order to write a custom aggregate function, you need these four operations:

  • Supplier<A>: A supplier that provides an empty, intermediary data structure to aggregate into
  • BiConsumer<A, T>: A accumulator that accumulates new values from the stream into our intermediary data structure.
  • BinaryOperator<A>: A combiner that combines two intermediary data structures. This is used for parallel streams only.
  • Function<A, R>: The finisher function that extracts the result from the intermediary data structure.

Writing a Collector from scratch is super annoying and “low level”. Using convenience API is much better, this is why the JDK has Collectors, or jOOλ has Agg, and there are other Collector libraries that provide some basic building blocks. The Stream API does not have to know about all of these building blocks, it just knows a single, abstract type. (serves the API). But we users, we don’t want only a single abstraction. We want those convenient building blocks (serves the user).

Convenience is never requirement, but always a big plus. In APIs like in languages.

Have you noticed this new method on InputStream, which has been added in Java 9?

public long transferTo(OutputStream out) throws IOException {
    Objects.requireNonNull(out, "out");
    long transferred = 0;
    byte[] buffer = new byte[DEFAULT_BUFFER_SIZE];
    int read;
    while ((read = this.read(buffer, 0, DEFAULT_BUFFER_SIZE)) >= 0) {
        out.write(buffer, 0, read);
        transferred += read;
    }
    return transferred;
}

How many times have we written this kind of code? N times (and a big N, too)

How many times have we enjoyed writing this code? 1 time

How many times too many have we written this code? N + 1 times.

Dogfooding is a very good way to make sure this kind of convenience our users love us for will make it into products much earlier on. Because we, the vendors, are the first users, and we hate writing this silly code all the time. And who is a better first user to implement cool new features for, if not ourselves?

How to Simulate a Liquibase Migration using H2

This post is part of a new blog series about database migrations, which will cover a variety of database change management topics.

In the near future, we’ll look much more into these topics, hoping to add more value to our users’ existing Flyway, Liquibase, and other integrations where the migration tools can profit a lot from jOOQ’s most recent features, including the parser.

Embracing Liquibase in the next jOOQ versions

In jOOQ 3.13, we’re going to offer a better integration for those users who use Liquibase for database change management. Our existing DDLDatabase parses, translates, and simulates a DDL script based migration against an in-memory H2 database instance.

This is using our built-in SQL translation functionality (test it on our website here: https://www.jooq.org/translate) to translate this Oracle SQL:

create table t (v varchar2(100))

to this H2 SQL:

create table t (v varchar(100))

That’s just a trivial example. More sophisticated translations are possible too. The main purpose of doing this has been, historically, to offer an “offline” jOOQ source code generation step that does not require connecting to an actual Oracle database instance to reverse engineer your schema, which you can represent in form of SQL migration scripts, which you’d typically run with something like Flyway.

In the future, we’ll add more features around this. One thing we’ve been thinking about has been to allow for translating Flyway migrations to other dialects, to make them vendor agnostic.

Simulating Liquibase migrations

Flyway and Liquibase work in quite a similar fashion, with Liquibase offering an additional abstraction layer over the SQL language. While pure SQL migrations are also possible with Liquibase’s SQL change, Liquibase also offers a set of mostly low level DDL command abstractions, such as the ADD COLUMN change.

Using their XML based DSL, you don’t have to remember whether the command is called:

alter table t add i int;
alter table t add column i int;

Or whatever creative syntax your SQL vendor came up with. Notice that, again, you can achieve the same thing with jOOQ’s translator, which you can use as an API or command line interface

Now, assuming you have the following liquibase database change log:

<?xml version="1.0" encoding="UTF-8"?>
<databaseChangeLog
    xmlns="http://www.liquibase.org/xml/ns/dbchangelog"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog
    http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-3.8.xsd">
    <changeSet author="authorName" id="changelog-1.0">
        <createTable tableName="TAB">
            <column name="COL" type="VARCHAR(10)">
                <constraints nullable="true"
                    primaryKey="false" unique="false" />
            </column>
        </createTable>
    </changeSet>
</databaseChangeLog>

And now, you would like to check if it is correct, i.e. if you can ship it to production as it is. Just create an in-memory H2 connection, and run the file against it:

Properties info = new Properties();
info.put("user", "sa");
info.put("password", "");

try (Connection con = new org.h2.Driver()
        .connect("jdbc:h2:mem:db", info)) {
    Database database = DatabaseFactory.getInstance()
      .findCorrectDatabaseImplementation(new JdbcConnection(con));
    Liquibase liquibase = new Liquibase("/path/to/liquibase.xml", 
      new FileSystemResourceAccessor(), database);
    liquibase.update("");
}

To check, just repeat the table definition a second time

<createTable tableName="TAB">
    <column name="COL" type="VARCHAR(10)">
        <constraints nullable="true"
            primaryKey="false" unique="false" />
    </column>
</createTable>
<createTable tableName="TAB">
    <column name="COL" type="VARCHAR(10)">
        <constraints nullable="true"
            primaryKey="false" unique="false" />
    </column>
</createTable>

and see how H2 complains about the statement:

Table "TAB" already exists; SQL statement:
CREATE TABLE PUBLIC.TAB (COL CLOB) [42101-200] [Failed SQL: (42101) CREATE TABLE PUBLIC.TAB (COL CLOB)]
	at liquibase.changelog.ChangeSet.execute(ChangeSet.java:646)
	at liquibase.changelog.visitor.UpdateVisitor.visit(UpdateVisitor.java:53)
	at liquibase.changelog.ChangeLogIterator.run(ChangeLogIterator.java:83)
	at liquibase.Liquibase.update(Liquibase.java:202)
	at liquibase.Liquibase.update(Liquibase.java:179)
	at liquibase.Liquibase.update(Liquibase.java:175)
	at liquibase.Liquibase.update(Liquibase.java:168)
	at LB.main(LB.java:21)

Using jOOQ’s tool chain to create schema snapshots

Want to export the schema again as DDL? Very easy, just use jOOQ for the task. For example, you can write the following Java code right after the above Liquibase migration simulation:

DSLContext ctx = DSL.using(connection);
ctx.ddl(ctx.meta().getSchemas("PUBLIC").get(0))
    .forEach(System.out::println);

And you can get, in H2 syntax (omitting the liquibase tables):

create table "PUBLIC"."TAB"(
  "COL" varchar(10) null
)

Want to export this to Oracle, instead? And using a custom schema, instead of H2’s PUBLIC? Write this instead:

DSLContext h2 = DSL.using(connection);
DSLContext oracle = DSL.using(ORACLE, new Settings()
   .withRenderMapping(new RenderMapping()
       .withSchemata(new MappedSchema()
           .withInput("PUBLIC")
           .withOutput("MY_SCHEMA"))));
oracle.ddl(h2.meta().getSchemas("PUBLIC").get(0))
    .forEach(System.out::println);

And now, we’re getting:

create table "MY_SCHEMA"."TAB"(
  "COL" varchar2(10) null
)

(notice, this works once #9384 is fixed).

Prefer an XML representation of your schema snapshot? Easy with jOOQ, as well:

DSLContext ctx = DSL.using(connection);
JAXB.marshal(ctx.informationSchema(
   ctx.meta().getSchemas("PUBLIC").get(0)
), System.out);

The output being:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<information_schema xmlns="http://www.jooq.org/xsd/jooq-meta-3.12.0.xsd">
    <schemata>
        <schema>
            <schema_name>PUBLIC</schema_name>
            <comment></comment>
        </schema>
    </schemata>
    <tables>
        <table>
            <table_schema>PUBLIC</table_schema>
            <table_name>TAB</table_name>
            <comment></comment>
        </table>
    </tables>
    <columns>
        <column>
            <table_schema>PUBLIC</table_schema>
            <table_name>TAB</table_name>
            <column_name>COL</column_name>
            <data_type>varchar</data_type>
            <character_maximum_length>10</character_maximum_length>
            <ordinal_position>1</ordinal_position>
            <is_nullable>true</is_nullable>
            <column_default></column_default>
            <comment></comment>
        </column>
    </columns>
</information_schema>

A useful file that can be put under version control with your current commit set of database change set to validate your schema, or export to other tools that can reverse engineer DDL or XML.

Using Liquibase migrations as jOOQ code generation input

Just like the pre-existing DDLDatabase, starting from jOOQ 3.13, you can simulate your Liquibase migration in-memory against an H2 database to reverse engineer it again for jOOQ’s code generator. This is documented here.

The relevant code generation configuration looks like this (standalone or with maven):

<configuration>
  <generator>
    <database>
      <name>org.jooq.meta.extensions.liquibase.LiquibaseDatabase</name>
      <properties>
        <property>
          <key>scripts</key>
          <value>src/main/resources/database.xml</value>
        </property>
      </properties>
    </database>
  </generator>
</configuration>

That’s it! As a bonus, you won’t have to manually simulate your migration anymore, as jOOQ’s LiquibaseDatabase already does it for you, behind the scenes, using the same three lines of code:

Database database = DatabaseFactory.getInstance()
  .findCorrectDatabaseImplementation(new JdbcConnection(con));
Liquibase liquibase = new Liquibase("/path/to/liquibase.xml", 
  new FileSystemResourceAccessor(), database);
liquibase.update("");

Stay tuned for more goodies on the topic of SQL migrations and database change management

Truth First, or Why You Should Mostly Implement Database First Designs

In this much overdue article, I will explain why I think that in almost all cases, you should implement a “database first” design in your application’s data models, rather than a “Java first” design (or whatever your client language is), the latter approach leading to a long road of pain and suffering, once your project grows.

This article is inspired by a recent Stack Overflow question.

Interesting reddit discussions on /r/java and /r/programming.

Code generation

To my surprise, a small group of first time jOOQ users seem to be appalled by the fact that jOOQ heavily relies on source code generation. No one keeps you from using jOOQ the way you want and you don’t have to use code generation, but the default way to use jOOQ according to the manual is to start with a (legacy) database schema, reverse engineer that using jOOQ’s code generator to get a bunch of classes representing your tables, and then to write type safe queries against those tables:

for (Record2<String, String> record : DSL.using(configuration)
//   ^^^^^^^^^^^^^^^^^^^^^^^ Type information derived from the 
//   generated code referenced from the below SELECT clause

       .select(ACTOR.FIRST_NAME, ACTOR.LAST_NAME)
//           vvvvv ^^^^^^^^^^^^  ^^^^^^^^^^^^^^^ Generated names
       .from(ACTOR)
       .orderBy(1, 2)) {
    // ...
}

The code is generated either manually outside of the build, or automatically with every build. For instance, such a re-generation could follow immediately after a Flyway database migration, which can also be run either manually or automatically.

Source code generation

There are different philosophies, advantages, and disadvantages regarding these manual/automatic approaches, which I don’t want to discuss in this article. But essentially, the point of generated code is that it provides a Java representation of something that we take for granted (a “truth”) either within or outside of our system. In a way, compilers do the same thing when they generate byte code, machine code, or some other type of source code from the original sources – we get a representation of our “truth” in a different language, for whatever reason.

There are many such code generators out there. For instance, XJC can generate Java code from XSD or WSDL files. The principle is always the same:

  • There is some truth (internal or external), like a specification, data model, etc.
  • We need a local representation of that truth in our programming language

And it almost always makes sense to generate that latter, to avoid redundancy.

Type providers and annotation processing

Noteworthy: Another, more modern approach to jOOQ’s particular code generation use-case would be Type Providers, as implemented by F#, in case of which the code is generated by the compiler while compiling. It never really exists in its source form. A similar (but less sophisticated) tool in Java are annotation processors, e.g. Lombok.

In a way, this does the same thing except:

  • You don’t see the generated code (perhaps that’s less appalling to some?)
  • You must ensure the types can be provided, i.e. the “truth” must always be available. Easy in the case of Lombok, which annotates the “truth”. A bit more difficult with database models, which rely on an always available live connection.

What’s the problem with code generation?

Apart from the tricky question whether to trigger code generation manually or automatically, some people seem to think that code must not be generated at all. The reason I hear the most is the idea that it is difficult to set up in a build pipeline. And yes, that is true. There is extra infrastructure overhead. Especially if you’re new to a certain product (like jOOQ, or JAXB, or Hibernate, etc.), setting up an environment takes time you would rather spend learning the API itself and getting value out of it.

If the overhead of learning how the code generator works is too high, then indeed, the API failed to make the code generator easy to use (and later on, to customise). That should be a high priority for any such API. But that’s the only argument against code generation. Other than that, it makes absolutely no sense at all to hand-write the local representation of the internal or external truth.

Many people argue that they don’t have time for that stuff. They need to ship their MVPs. They can finalise their build pipelines later. I say:

“Too Busy To Improve” Licensed CC by Alan O’Rourke / Audience Stack. Original Image

“But Hibernate / JPA makes coding Java first easy”

Yes that’s true. And it’s both a bliss and a curse for Hibernate and its users. In Hibernate, you can just write a couple of entities, such as:

@Entity
class Book {
  @Id
  int id;
  String title;
}

And you’re almost set. Let Hibernate generate the boring “details” of how to define this entity in your SQL dialect’s DDL:

CREATE TABLE book (
  id INTEGER PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
  title VARCHAR(50),

  CONSTRAINT pk_book PRIMARY KEY (id)
);

CREATE INDEX i_book_title ON book (title);

… and start running the application. That’s really cool to get started quickly and to try out things.

But, huh, wait. I cheated.

  • Will Hibernate really apply that named primary key definition?
  • Will it create the index on TITLE, which I know we’ll need?
  • Will it add an identity specification?

Probably not. While you’re developing your greenfield project, it is convenient to always throw away your entire database and re-generate it from scratch, once you’ve added the additional annotations. So, the Book entity would eventually look like this:

@Entity
@Table(name = "book", indexes = {
  @Index(name = "i_book_title", columnList = "title")
})
class Book {
  @Id
  @GeneratedValue(strategy = IDENTITY)
  int id;
  String title;
}

Cool. Re-generate. Again, this makes it really easy to get started.

But you’ll pay the price later on

At some point, you go to production. And that’s when this model no longer works. Because

Once you go live, you can no longer throw away your database, as your database has become legacy.

From now on, you have to write DDL migration scripts, e.g. using Flyway. And then, what happens to your entities? You can either adapt them manually (so you double the work), or have Hibernate re-generate them for you (how big are your chances of the generation matching your expectations?) You can only lose.

Because once you go to production, you need hotfixes. And those have to go live fast. And since you didn’t prepare for pipelining your migrations to production smoothly, you’ll patch things wildly. And then you run out of time to do it right™. And you’ll blame Hibernate, because it’s always someone else’s fault…

Instead, you could have done things entirely differently from the beginning. Like using those round wheels.

Go Database First

The real “truth” of your database schema, and the “sovereignty” over it, resides with your database. The database is the only place where the schema is defined, and all clients have a copy of the database schema, not vice versa. The data is in your database, not in your client, so it makes perfect sense to enforce the schema and its integrity in the database, right where the data is.

This is old wisdom, nothing new. Primary and unique keys are good. Foreign keys are good. Check constraints are good. Assertions (when they’re finally implemented) are good.

And that’s not where it ends. For instance, if you’re using Oracle, you may want to specify:

  • In what tablespace your table resides
  • What PCTFREE value it has
  • What the cache size of your sequence (behind the identity) is

Maybe, all of this doesn’t matter in small systems, but you don’t have to go “big data” before you can profit from vendor-specific storage optimisations as the above. None of the ORMs I’ve ever seen (including jOOQ) will allow you to use the full set of DDL options that you may want to use on your database. ORMs offer some tools to help you write DDL.

But ultimately, a well-designed schema is hand written in DDL. All generated DDL is only an approximation of that.

What about the client model?

As mentioned before, you will need a copy of your database schema in your client, a client representation. Needless to say that this client representation needs to be in-sync with the real model. How to best do that? By using a code generator.

All databases expose their meta information through SQL. Here’s how to get all tables from your database in various SQL dialects:

-- H2, HSQLDB, MySQL, PostgreSQL, SQL Server
SELECT table_schema, table_name
FROM information_schema.tables

-- DB2
SELECT tabschema, tabname
FROM syscat.tables

-- Oracle
SELECT owner, table_name
FROM all_tables

-- SQLite
SELECT name
FROM sqlite_master

-- Teradata
SELECT databasename, tablename
FROM dbc.tables

These queries (or similar ones, e.g. depending on whether views, materialised views, table valued functions should also be considered) are also run by JDBC’s DatabaseMetaData.getTables() call, or by the jOOQ-meta module.

From the result of such queries, it’s relatively easy to generate any client representation of your database model, regardless what your client technology is.

  • If you’re using JDBC or Spring, you can create a bunch of String constants
  • If you’re using JPA, you can generate the entities themselves
  • If you’re using jOOQ, you can generate the jOOQ meta model

Depending on the amount of features your client API offers (e.g. jOOQ or JPA), the generated meta model can be really rich and complete. Consider, for instance, jOOQ 3.11’s implicit join feature, which relies on generated meta information about the foreign key relationships between your tables.

Now, any database increment will automatically lead to updated client code. For instance, imagine:

ALTER TABLE book RENAME COLUMN title TO book_title;

Would you really want to do this work twice? No way. Just commit the DDL, run it through your build pipeline, and have an updated entity:

@Entity
@Table(name = "book", indexes = {

  // Would you have thought of this?
  @Index(name = "i_book_title", columnList = "book_title")
})
class Book {
  @Id
  @GeneratedValue(strategy = IDENTITY)
  int id;

  @Column("book_title")
  String bookTitle;
}

Or an updated jOOQ class. Plus: Your client code might no longer compile, which can be a good thing! Most DDL changes are also semantic changes, not just syntactic ones. So, it’s great to be able to see in compiled client source code, what code is (or may be) affected by your database increment.

A single truth

Regardless what technology you’re using, there’s always one model that contains the single truth for a subsystem – or at least, we should aim for this goal and avoid the enterprisey mess where “truth” is everywhere and nowhere. It just makes everything much simpler. If you exchange XML files with some other system, you’re going to use XSD. Like jOOQ’s INFORMATION_SCHEMA meta model in XML form:
https://www.jooq.org/xsd/jooq-meta-3.10.0.xsd

  • XSD is well understood
  • XSD specifies XML content very well, and allows for validation in all client languages
  • XSD can be versioned easily, and evolved backwards compatibly
  • XSD can be translated to Java code using XJC

The last bullet is important. When communicating with an external system through XML messages, we want to be sure our messages are valid. That’s really really easy to do with JAXB, XJC, and XSD. It would be outright nuts to think that a Java-first approach where we design our messages as Java objects could somehow be reasonably mapped to XML for someone else to consume. That generated XML would be of very poor quality, undocumented, and hard to evolve. If there’s an SLA on such an interface, we’d be screwed.

Frankly, that’s what happens to JSON APIs all the time, but that’s another story, another rant…

Databases: Same thing

When you’re using databases, it’s the same thing. The database owns its data and it should be the master of the schema. All modifications to the schema should be implemented using DDL directly, to update the single truth.

Once that truth is updated, all clients need to update their copies of the model as well. Some clients may be written in Java, using either (or both of) jOOQ and Hibernate, or JDBC. Other clients may be written in Perl (good luck to them). Even other clients may be written in C#. It doesn’t matter. The main model is in the database. ORM-generated models are of poor quality, not well documented, and hard to evolve.

So, don’t do it. And, don’t do it from the very beginning. Instead, go database first. Build a deployment pipeline that can be automated. Include code generators to copy your database model back into the clients. And stop worrying about code generation. It’s a good thing. You’ll be productive. All it takes is a bit of initial effort to set it up, and you’ll get years of improved productivity for the rest of your project.

Thank me later.

Clarification

Just to be sure: This article in no way asserts that your database model should be imposed on your entire system (e.g. your domain, your business logic, etc. etc.). The claim I made here is that client code interacting with the database should act upon the database model, and not have its own first class model of the database instead. This logic typically resides in the data access layer of your client.

In 2-tier architectures, which still have their place sometimes, that may be the only model of your system. In most systems, however, I consider the data access layer a “subsystem” that encapsulates the database model. So, there.

Exceptions

There are always exceptions, and I promised that the database first and code generation approach may not always be the right choice. These exceptions are (probably not exhaustive):

  • When the schema is unknown and must be discovered. E.g. you’re a tool vendor helping users navigate any schema. Duh… No code generation. But still database first.
  • When the schema needs to be generated on the fly for some task. This sounds a lot like a more or less sophisticated version of the entity attribute value pattern, i.e. you don’t really have a well-defined schema. In that case, it’s often not even sure if an RDBMS will be the right choice.

The nature of exceptions is that they’re exceptional. In the majority of RDBMS usage, the schema is known in advance, placed inside the RDBMS as the single source of “truth”, and clients will have derived copies from it – ideally generated using a code generator.

jOOQ Tuesdays: Axel Fontaine Predicts Exciting Times as our Industry is Maturing

Welcome to the jOOQ Tuesdays series. In this series, we’ll publish an article on the third Tuesday every other month where we interview someone we find exciting in our industry from a jOOQ perspective. This includes people who work with SQL, Java, Open Source, and a variety of other related topics.

afontaine

We have the pleasure of talking to Axel Fontaine in this fourth edition who will be telling us about the exciting times that are ahead as our industry is maturing.

Hi Axel – Citing your website: From “Software is what I do” to “Architecting for Continuous Delivery & Zero Downtime”. Where did your personal passion for devops come from?

Over the last 18 years in our industry I have seen a lot of projects for numerous clients in many different sectors. And it always struck me. The intersection between development and operations was almost invariably the place where the largest efficiency gains could be realized. Over time this then became the place where I naturally started to focus my attention, initially for consultancy and training, and then later also for public speaking and the product business.

Very interesting! I suspect that the Flyway download rates also confirm the increasing need for devops tooling, right?

I am glad you mentioned it. Yes, Flyway download numbers have been going through the roof. In March we had one download every 42 seconds and we are well on track to break the 1 million mark in 2015! This picture actually tells us a number of things. First of all it confirms that the need for database migrations is a very widespread itch and that Flyway’s no-nonsense pragmatic and lightweight approach to scratch it resonates with the industry. The other thing this tells us, is that our industry as a whole is maturing and continuous delivery and deployment in particular is becoming more mainstream. This increased pace of delivery comes with a pressing need for a higher level of automation. And modern lightweight tools like Flyway are a great fit for this.

Now, you are converting that passion into a product called Boxfuse. Can you tell us a little more about it?

I have always been a strong believer in lightweight and open solutions that are a pleasure to use and minimize lock-in. When it comes to deploying an application to the cloud, so far you had to choose one or the other. You could either have something that is a joy to use (PaaS), but leaving you stuck in a walled garden. Or you could have something open (IaaS) that unfortunately leaves you with a lot of operational concerns to deal with. Boxfuse aims for a fresh new middle ground: the convenience of PaaS with the freedom of IaaS. To achieve that Boxfuse does away with a bunch of cargo cult. No more general purpose operation system, no more traditional provisioning. Instead Boxfuse analyses your application and in just a few seconds, fuses it into a tiny bootable image that is about 1% percent of the size of a regular system. Boxfuse then offers you a sophisticated blue/green deployment process for zero downtime updates. All of this, out of the box and with a single command.

We’re entrepreneurs ourselves. Marketing a product is a long road. What are your own key takeaways from your experience as both a consultant and entrepreneur?

I couldn’t agree more. What marketing really is is competing for attention, and turning that attention into mind share through a series of arguments, both technical and emotional. It is an exercise that is often overlooked in our technical circles, where the code is front and center. But the best technology in the world isn’t worth a dime if you can’t convince someone else to use it. And that takes perseverance, drive, and dedication. What seems easy on the outside of often the fruit of lots of hard labor behind the scenes. Eventually you will succeed, but give it a little time. Overnight success takes years.

Where do you see the future of our industry, from your perspective?

Believe me when I say these are exciting times. While it still feels like a big adventure with great new frontiers to explore, our industry is maturing rapidly. Overly complex tools are going the way of the dodo. User expectations are rising fast. And the only way to meet them is through modern  lightweight tools and services that integrate seamlessly with our workflow and perform the undifferentiated heavy lifting for us. We need to eliminate all unnecessary complexity and so we can focus on what matters most: delighting our customers.

Flyway and jOOQ for Unbeatable SQL Development Productivity

When performing database migrations, we at Data Geekery recommend using jOOQ with Flyway – Database Migrations Made Easy. In this post, we’re going to look into a simple way to get started with the two frameworks.

Philosophy

There are a variety of ways how jOOQ and Flyway could interact with each other in various development setups. In this tutorial we’re going to show just one variant of such framework team play – a variant that we find particularly compelling for most use cases.

The general philosophy and workflow behind the following approach can be summarised as this:

  • 1. Database increment
  • 2. Database migration
  • 3. Code re-generation
  • 4. Development

The four steps above can be repeated time and again, every time you need to modify something in your database. More concretely, let’s consider:

  • 1. Database increment – You need a new column in your database, so you write the necessary DDL in a Flyway script
  • 2. Database migration – This Flyway script is now part of your deliverable, which you can share with all developers who can migrate their databases with it, the next time they check out your change
  • 3. Code re-generation – Once the database is migrated, you regenerate all jOOQ artefacts (see code generation), locally
  • 4. Development – You continue developing your business logic, writing code against the udpated, generated database schema

0.1. Maven Project Configuration – Properties

The following properties are defined in our pom.xml, to be able to reuse them between plugin configurations:

<properties>
    <db.url>jdbc:h2:~/flyway-test</db.url>
    <db.username>sa</db.username>
</properties>

0.2. Maven Project Configuration – Dependencies

While jOOQ and Flyway could be used in standalone migration scripts, in this tutorial, we’ll be using Maven for the standard project setup. You will also find the source code of this tutorial on GitHub, and the full pom.xml file here.

These are the dependencies that we’re using in our Maven configuration:

<!-- We'll add the latest version of jOOQ 
     and our JDBC driver - in this case H2 -->
<dependency>
    <groupId>org.jooq</groupId>
    <artifactId>jooq</artifactId>
    <version>3.4.0</version>
</dependency>
<dependency>
    <groupId>com.h2database</groupId>
    <artifactId>h2</artifactId>
    <version>1.4.177</version>
</dependency>

<!-- For improved logging, we'll be using 
     log4j via slf4j to see what's going
     on during migration and code generation -->
<dependency>
    <groupId>log4j</groupId>
    <artifactId>log4j</artifactId>
    <version>1.2.16</version>
</dependency>
<dependency>
    <groupId>org.slf4j</groupId>
    <artifactId>slf4j-log4j12</artifactId>
    <version>1.7.5</version>
</dependency>

<!-- To esnure our code is working, we're
     using JUnit -->
<dependency>
    <groupId>junit</groupId>
    <artifactId>junit</artifactId>
    <version>4.11</version>
    <scope>test</scope>
</dependency>

0.3. Maven Project Configuration – Plugins

After the dependencies, let’s simply add the Flyway and jOOQ Maven plugins like so. The Flyway plugin:

<plugin>
    <groupId>org.flywaydb</groupId>
    <artifactId>flyway-maven-plugin</artifactId>
    <version>3.0</version>

    <!-- Note that we're executing the Flyway
         plugin in the "generate-sources" phase -->
    <executions>
        <execution>
            <phase>generate-sources</phase>
            <goals>
                <goal>migrate</goal>
            </goals>
        </execution>
    </executions>

    <!-- Note that we need to prefix the db/migration
         path with filesystem: to prevent Flyway
         from looking for our migration scripts
         only on the classpath -->
    <configuration>
        <url>${db.url}</url>
        <user>${db.username}</user>
        <locations>
            <location>filesystem:src/main/resources/db/migration</location>
        </locations>
    </configuration>
</plugin>

The above Flyway Maven plugin configuration will read and execute all database migration scripts from src/main/resources/db/migrationprior to compiling Java source code. While the official Flyway documentation suggests that migrations be done in the compile phase, the jOOQ code generator relies on such migrations having been done prior to code generation.

After the Flyway plugin, we’ll add the jOOQ Maven Plugin. For more details, please refer to the manual’s section about the code generation configuration.

<plugin>
    <groupId>org.jooq</groupId>
    <artifactId>jooq-codegen-maven</artifactId>
    <version>${org.jooq.version}</version>

    <!-- The jOOQ code generation plugin is also 
         executed in the generate-sources phase,
         prior to compilation -->
    <executions>
        <execution>
            <phase>generate-sources</phase>
            <goals>
                <goal>generate</goal>
            </goals>
        </execution>
    </executions>

    <!-- This is a minimal working configuration. 
         See the manual's section about the code
         generator for more details -->
    <configuration>
        <jdbc>
            <url>${db.url}</url>
            <user>${db.username}</user>
        </jdbc>
        <generator>
            <database>
                <includes>.*</includes>
                <inputSchema>FLYWAY_TEST</inputSchema>
            </database>
            <target>
                <packageName>org.jooq.example.flyway.db.h2</packageName>
                <directory>target/generated-sources/jooq-h2</directory>
            </target>
        </generator>
    </configuration>
</plugin>

This configuration will now read the FLYWAY_TEST schema and reverse-engineer it into the target/generated-sources/jooq-h2 directory, and within that, into the org.jooq.example.flyway.db.h2 package.

1. Database increments

Now, when we start developing our database. For that, we’ll create database increment scripts, which we put into thesrc/main/resources/db/migration directory, as previously configured for the Flyway plugin. We’ll add these files:

  • V1__initialise_database.sql
  • V2__create_author_table.sql
  • V3__create_book_table_and_records.sql

These three scripts model our schema versions 1-3 (note the capital V!). Here are the scripts’ contents

-- V1__initialise_database.sql
DROP SCHEMA flyway_test IF EXISTS;

CREATE SCHEMA flyway_test;
-- V2__create_author_table.sql
CREATE SEQUENCE flyway_test.s_author_id START WITH 1;

CREATE TABLE flyway_test.author (
  id INT NOT NULL,
  first_name VARCHAR(50),
  last_name VARCHAR(50) NOT NULL,
  date_of_birth DATE,
  year_of_birth INT,
  address VARCHAR(50),

  CONSTRAINT pk_t_author PRIMARY KEY (ID)
);
-- V3__create_book_table_and_records.sql
CREATE TABLE flyway_test.book (
  id INT NOT NULL,
  author_id INT NOT NULL,
  title VARCHAR(400) NOT NULL,

  CONSTRAINT pk_t_book PRIMARY KEY (id),
  CONSTRAINT fk_t_book_author_id FOREIGN KEY (author_id) REFERENCES flyway_test.author(id)
);


INSERT INTO flyway_test.author VALUES (next value for flyway_test.s_author_id, 'George', 'Orwell', '1903-06-25', 1903, null);
INSERT INTO flyway_test.author VALUES (next value for flyway_test.s_author_id, 'Paulo', 'Coelho', '1947-08-24', 1947, null);

INSERT INTO flyway_test.book VALUES (1, 1, '1984');
INSERT INTO flyway_test.book VALUES (2, 1, 'Animal Farm');
INSERT INTO flyway_test.book VALUES (3, 2, 'O Alquimista');
INSERT INTO flyway_test.book VALUES (4, 2, 'Brida');

2. Database migration and 3. Code regeneration

The above three scripts are picked up by Flyway and executed in the order of the versions. This can be seen very simply by executing:

mvn clean install

And then observing the log output from Flyway…

[INFO] --- flyway-maven-plugin:3.0:migrate (default) @ jooq-flyway-example ---
[INFO] Database: jdbc:h2:~/flyway-test (H2 1.4)
[INFO] Validated 3 migrations (execution time 00:00.004s)
[INFO] Creating Metadata table: "PUBLIC"."schema_version"
[INFO] Current version of schema "PUBLIC": <>
[INFO] Migrating schema "PUBLIC" to version 1
[INFO] Migrating schema "PUBLIC" to version 2
[INFO] Migrating schema "PUBLIC" to version 3
[INFO] Successfully applied 3 migrations to schema "PUBLIC" (execution time 00:00.073s).

… and from jOOQ on the console:

[INFO] --- jooq-codegen-maven:3.5.0-SNAPSHOT:generate (default) @ jooq-flyway-example ---
[INFO] Using this configuration:
...
[INFO] Generating schemata      : Total: 1
[INFO] Generating schema        : FlywayTest.java
[INFO] ----------------------------------------------------------
[....]
[INFO] GENERATION FINISHED!     : Total: 337.576ms, +4.299ms

4. Development

Note that all of the previous steps are executed automatically, every time someone adds new migration scripts to the Maven module. For instance, a team member might have committed a new migration script, you check it out, rebuild and get the latest jOOQ-generated sources for your own development or integration-test database.

Now, that these steps are done, you can proceed writing your database queries. Imagine the following test case

import org.jooq.Result;
import org.jooq.impl.DSL;
import org.junit.Test;

import java.sql.DriverManager;

import static java.util.Arrays.asList;
import static org.jooq.example.flyway.db.h2.Tables.*;
import static org.junit.Assert.assertEquals;

public class AfterMigrationTest {

    @Test
    public void testQueryingAfterMigration() throws Exception {
        try (Connection c = DriverManager.getConnection("jdbc:h2:~/flyway-test", "sa", "")) {
            Result<?> result =
            DSL.using(c)
               .select(
                   AUTHOR.FIRST_NAME,
                   AUTHOR.LAST_NAME,
                   BOOK.ID,
                   BOOK.TITLE
               )
               .from(AUTHOR)
               .join(BOOK)
               .on(AUTHOR.ID.eq(BOOK.AUTHOR_ID))
               .orderBy(BOOK.ID.asc())
               .fetch();

            assertEquals(4, result.size());
            assertEquals(asList(1, 2, 3, 4), result.getValues(BOOK.ID));
        }
    }
}

If you run the mvn clean install again, the above integration test will now compile and pass!

Reiterate

The power of this approach becomes clear once you start performing database modifications this way. Let’s assume that the French guy on our team prefers to have things his way (no offense intended ;-) ):

-- V4__le_french.sql
ALTER TABLE flyway_test.book 
  ALTER COLUMN title RENAME TO le_titre;

They check it in, you check out the new database migration script, run

mvn clean install

And then observe the log output:

[INFO] --- flyway-maven-plugin:3.0:migrate (default) @ jooq-flyway-example ---
[INFO] --- flyway-maven-plugin:3.0:migrate (default) @ jooq-flyway-example ---
[INFO] Database: jdbc:h2:~/flyway-test (H2 1.4)
[INFO] Validated 4 migrations (execution time 00:00.005s)
[INFO] Current version of schema "PUBLIC": 3
[INFO] Migrating schema "PUBLIC" to version 4
[INFO] Successfully applied 1 migration to schema "PUBLIC" (execution time 00:00.016s).

So far so good, but later on:

[ERROR] COMPILATION ERROR :
[INFO] -------------------------------------------------------------
[ERROR] C:\...\AfterMigrationTest.java:[24,19] error: cannot find symbol
[INFO] 1 error

When we go back to our Java integration test, we can immediately see that the TITLE column is still being referenced, but it no longer exists:

public class AfterMigrationTest {

    @Test
    public void testQueryingAfterMigration() throws Exception {
        try (Connection c = DriverManager.getConnection("jdbc:h2:~/flyway-test", "sa", "")) {
            Result<?> result =
            DSL.using(c)
               .select(
                   AUTHOR.FIRST_NAME,
                   AUTHOR.LAST_NAME,
                   BOOK.ID,
                   BOOK.TITLE
                   //   ^^^^^ This column no longer exists. 
                   //   We'll have to rename it to LE_TITRE
               )
               .from(AUTHOR)
               .join(BOOK)
               .on(AUTHOR.ID.eq(BOOK.AUTHOR_ID))
               .orderBy(BOOK.ID.asc())
               .fetch();

            assertEquals(4, result.size());
            assertEquals(asList(1, 2, 3, 4), result.getValues(BOOK.ID));
        }
    }
}

Conclusion

This tutorial shows very easily how you can build a rock-solid development process using Flyway and jOOQ to prevent SQL-related errors very early in your development lifecycle – immediately at compile time, rather than in production!

Visit the Flyway website and the jOOQ website.

Liquibase for DB Migrations

I have just now discovered a very nice-looking tool for database migrations: Liquibase

http://www.liquibase.org/

With Liquibase, you can model your DB increments as XML files that will translate to as many as 13 different databases. A sample DB increment (taken from the Liquibase manual):

<!--
  ALTER TABLE PERSON MODIFY COLUMN firstname VARCHAR(5000);
  -->

    


<!--
  ALTER TABLE ADDRESS ADD CONSTRAINT fk_address_person
  FOREIGN KEY (person_id) REFERENCES person (id);
  -->


<!--
  UPDATE ProductSettings SET property = 'vatCategory'
  WHERE property = 'vat';
  -->

    
    property='vat'

…and so on. I guess it’s about time to contact the Liquibase folks and ask for cooperation! A fully integrated solution involving database schema management, database schema migration, and jOOQ’s source code generation seem to be the perfect toolset for a Java database developer