jOOQ’s fluent API in BNF notation

I have recently posted an article about how to generally design a fluent API in Java. By fluent API, I don’t mean simple constructs such as

new Builder().withSomething(x)
             .withSomethingElse(y)
             .withSomething(z)
             .withAnotherThing(xx);

The above is just simple method-chaining, without any sophisticated formal language definition. Most often in method-chaining contexts, the order of method calls is irrelevant to the API and may be chosen freely (i.e. you can call “withAnotherThing()” first, then “withSomethingElse()”. Instead, I mean a full-fledged domain specific language, formally defined using BNF notation (or anything equivalent).

jOOQ’s fluent API

Today, I’d like to give some insight about how jOOQ can be formally expressed as a true domain specific language using BNF notation. jOOQ in itself has actually evolved to become a SQL dialect of its own. It knows most of the standard DML SQL statements and clauses, as well as many vendor-specific ones. Let’s have a look at jOOQ 2.0’s understanding of a SELECT statement:

SELECT ::=
   ( ( 'select' | 'selectDistinct' ) FIELD* |
       'selectOne' |
       'selectZero' |
       'selectCount' )
     ( 'select' FIELD* )*
     ( 'hint' SQL )*
     ( 'from' ( TABLE+ | SQL )
       ( ( 'join' | 'leftOuterJoin' | 'rightOuterJoin' | 'fullOuterJoin' )
         ( TABLE | SQL )
         ( 'on' ( CONDITION+ | SQL ) MORE_CONDITIONS? |
           'using' FIELD+ ) |
         ( 'crossJoin' | 'naturalJoin' |
           'naturalLeftOuterJoin' | 'naturalRightOuterJoin' )
       ( TABLE | SQL ) )* )?
     ( ( 'where' ( CONDITION+ | SQL ) |
         ( 'whereExists' | 'whereNotExists' ) SELECT ) MORE_CONDITIONS? )?
     ( ( 'connectBy' | 'connectByNoCycle' )
       ( CONDITION | SQL )
       ( 'and' ( CONDITION | SQL ) )*
       ( 'startWith' ( CONDITION | SQL ) )? )?
     ( 'groupBy' GROUP_FIELD+ )?
     ( 'having' ( CONDITION+ | SQL ) MORE_CONDITIONS? )?
     ( 'orderBy' ( FIELD+ | SORT_FIELD+ | INT+ ) )?
     ( ( 'limit' INT ( 'offset' INT | INT )? ) |
       ( ( 'forUpdate'
           ( 'of' ( FIELD+ | TABLE+ ) )?
           ( 'wait' INT |
             'noWait' |
             'skipLocked' )? ) |
         'forShare' ) )?
   ( ( 'union' | 'unionAll' | 'except' | 'intersect' ) SELECT )*

Or in the much more readable graphical representation:

Unlike simpler querying API’s such as the JPA Criteria Query API or QueryDSL, for example, jOOQ really emphasises on SQL syntax correctness. In that way, it is impossible to form complex SQL queries in a nonsensical way such as providing “JOIN” clauses without previously declaring at least one table source, or providing “ON” clauses without previously defining a “JOIN” clause, or providing an “ON” clause on a “CROSS JOIN”, etc, etc.

The full jOOQ BNF representation can be seen here:

If you compare this BNF and its output (jOOQ) to what I’ve written in my previous blog post about fluent API design, you can see that there is potential for complete formalisation of this process. In principle, any arbitrary BNF can be formally transformed into a set of Java interfaces modelling your domain specific language directly in Java. I will soon create a prototype source code generator for this, so stay tuned!

if – else coding style best practices

The following post is going to be an advanced curly-braces discussion with no right or wrong answer, just more “matter of taste”. It is about whether to put “else” (and other keywords, such as “catch”, “finally”) on a new line or not.

Some may write

if (something) {
  doIt();
} else {
  dontDoIt();
}

I, however, prefer

if (something) {
  doIt();
} 
else {
  dontDoIt();
}

That looks silly, maybe. But what about comments? Where do they go? This somehow looks wrong to me:

// This is the case when something happens and blah
// blah blah, and then, etc...
if (something) {
  doIt();
} else {
  // This happens only 10% of the time, and then you
  // better think twice about not doing it
  dontDoIt();
}

Isn’t the following much better?

// This is the case when something happens and blah
// blah blah, and then, etc...
if (something) {
  doIt();
}

// This happens only 10% of the time, and then you
// better think twice about not doing it
else {
  dontDoIt();
}

In the second case, I’m really documenting the “if” and the “else” case separately. I’m not documenting the call to “dontDoIt()”. This can go further:

// This is the case when something happens and blah
// blah blah, and then, etc...
if (something) {
  doIt();
}

// Just in case
else if (somethingElse) {
  doSomethingElse();
}

// This happens only 10% of the time, and then you
// better think twice about not doing it
else {
  dontDoIt();
}

Or with try-catch-finally:

// Let's try doing some business
try {
  doIt();
}

// IOExceptions don't really occur
catch (IOException ignore) {}

// SQLExceptions need to be propagated
catch (SQLException e) {
  throw new RuntimeException(e);
}

// Clean up some resources
finally {
  cleanup();
}

It looks tidy, doesn’t it? As opposed to this:

// Let's try doing some business
try {
  doIt();
} catch (IOException ignore) {
  // IOExceptions don't really occur
} catch (SQLException e) {
  // SQLExceptions need to be propagated
  throw new RuntimeException(e);
} finally {
  // Clean up some resources
  cleanup();
}

I’m curious to hear your thoughts…

The good API design

I’ve stumbled upon a nice checklist wrapping up API design guidelines. An extract:

  1. Favor placing API and implementation into separate packages
  2. Favor placing APIs into high-level packages and implementation into lower-level packages
  3. Consider breaking up large APIs into several packages
  4. Consider putting API and implementation packages into separate Java archives
  5. Avoid (minimize) internal dependencies on implementation classes in APIs
  6. Avoid unnecessary API fragmentation
  7. Do not place public implementation classes in the API package
  8. Do not create dependencies between callers and implementation classes
  9. Do not place unrelated APIs into the same package
  10. Do not place API and SPI into the same package
  11. Do not move or rename the package of an already released public API

See the full checklist here:

http://theamiableapi.com/2012/01/16/java-api-design-checklist/

Let’s revise the SQL FROM clause

Intuitive SQL

SQL is extremely simple and yet at times, tricky. Most SQL developers have an intuitive (as opposed to formal) understanding of how the language works, for two reasons:

  1. It is designed “intuitively”, like a natural language. Maybe that keeps us from studying it more formally
  2. The formal language specification is not really freely available (except SQL-92), nor easy to read, nor necessary to get started

Of course, there are also those to whom SQL is a mystery. Let’s ignore those and have a look at how a simple SELECT statement works:

SELECT *
FROM my_table
JOIN my_other_table ON id1 = id2
JOIN my_third_table ON id2 = id3
WHERE ...

This looks like we’re using 4 of the SELECT statement’s many clauses:

  • SELECT: specifying the projection
  • FROM: specifying the “main” table
  • JOIN: specifying several “joined” tables
  • WHERE: specifying the selection / filtering

Misinterpretation of the “JOIN clause”

But in fact, our intuition is slightly wrong. “Our intuition…” Of course, you might already know this but I feel it’s something we don’t think about every day… How are we wrong? By the fact that we tend to think there is a formal “JOIN” clause in a SELECT statement. There isn’t. Here’s a simplified T-SQL definition:

<query_specification> ::= 
SELECT [ ALL | DISTINCT ] < select_list > 
    [ FROM { <table_source> } [ ,...n ] ] 
    [ WHERE <search_condition> ] 
    [ <GROUP BY> ] 
    [ HAVING < search_condition > ] 

As you can see, there is no explicit JOIN clause in a SELECT statement, even if we tend to phrase our SQL in a way that would indicate so. Let’s have a closer look at the (simplified) definition of <table_source>

<table_source> ::= 
{
    table_or_view_name [ [ AS ] table_alias ]
    | user_defined_function [ [ AS ] table_alias ]
    | derived_table [ AS ] table_alias [ ( column_alias [ ,...n ] ) ] 
    | <joined_table> 
}
<joined_table> ::= 
{
    <table_source> <join_type> <table_source> ON <search_condition> 
    | <table_source> CROSS JOIN <table_source> 
    | [ ( ] <joined_table> [ ) ] 
}
<join_type> ::= 
    [ { INNER | { { LEFT | RIGHT | FULL } [ OUTER ] } } [ <join_hint> ] ]
    JOIN

So in fact, the true reading of our above SQL statement goes like this:

SELECT *
FROM ((my_table JOIN my_other_table ON id1 = id2)
                JOIN my_third_table ON id2 = id3)
WHERE ...

We’re actually creating a tree of table sources, first by joining “my_other_table” to “my_table” and then by joining “my_third_table” to the previously created table source. When this is understood, the following statement will no longer be surprising:

SELECT *
FROM t1 JOIN t2 ON id1 = id2,
     another_table,
     t3 JOIN t4 ON id3 = id4,
     yet_another_table,
     t5 JOIN t6 ON id5 = id6
WHERE ...

Here, we select from an overall cross product of 5 table sources, some of which are actual tables, some of which are created from joined tables…

Implications for jOOQ

In jOOQ, this is about to be supported for release 2.0.3. The org.jooq.Table API will be enhanced by several join methods. You will be able to write things like this:

// Create a specific type of table source
Table<Record> tableSource =
  MY_TABLE.join(MY_OTHER_TABLE).on(ID1.equal(ID2))
          .join(MY_THIRD_TABLE).on(ID2.equal(ID3));

// Use the above table source in a select statement
create.select()
      .from(tableSource)
      .where(...);

// Use several "complex" table sources
create.select()
      .from(T1.join(T2).on(ID1.equal(ID2)),
            ANOTHER_TABLE,
            T3.join(T4).on(ID3.equal(ID4)),
            YET_ANOTHER_TABLE,
            T5.join(T6).on(ID5.equal(ID6)))
      .where(...);

The existing syntax will be maintained as a fully compatible convenience API. These additions are an important step forward for creating more complex table sources, also involving Oracle (and SQL Server) PIVOT and UNPIVOT tables, about which I will blog in one of the next posts. Stay tuned!

The Java Fluent API Designer Crash Course

Ever since Martin Fowler’s talks about fluent interfaces, people have started chaining methods all over the place, creating fluent API’s (or DSLs) for every possible use case. In principle, almost every type of DSL can be mapped to Java. Let’s have a look at how this can be done

DSL rules

DSLs (Domain Specific Languages) are usually built up from rules that roughly look like these


1. SINGLE-WORD
2. PARAMETERISED-WORD parameter
3. WORD1 [ OPTIONAL-WORD ]
4. WORD2 { WORD-CHOICE-A | WORD-CHOICE-B }
5. WORD3 [ , WORD3 ... ]

Alternatively, you could also declare your grammar like this (as supported by this nice Railroad Diagrams site):


Grammar ::= ( 
  'SINGLE-WORD' | 
  'PARAMETERISED-WORD' '('[A-Z]+')' |
  'WORD1' 'OPTIONAL-WORD'? | 
  'WORD2' ( 'WORD-CHOICE-A' | 'WORD-CHOICE-B' ) | 
  'WORD3'+ 
)

Put in words, you have a start condition or state, from which you can choose some of your languages’ words before reaching an end condition or state. It’s like a state-machine, and can thus be drawn in a picture like this:

Simple Grammar

A simple grammar created with http://www.bottlecaps.de/rr/ui

Java implementation of those rules

With Java interfaces, it is quite simple to model the above DSL. In essence, you have to follow these transformation rules:

  • Every DSL “keyword” becomes a Java method
  • Every DSL “connection” becomes an interface
  • When you have a “mandatory” choice (you can’t skip the next keyword), every keyword of that choice is a method in the current interface. If only one keyword is possible, then there is only one method
  • When you have an “optional” keyword, the current interface extends the next one (with all its keywords / methods)
  • When you have a “repetition” of keywords, the method representing the repeatable keyword returns the interface itself, instead of the next interface
  • Every DSL subdefinition becomes a parameter. This will allow for recursiveness

Note, it is possible to model the above DSL with classes instead of interfaces, as well. But as soon as you want to reuse similar keywords, multiple inheritance of methods may come in very handy and you might just be better off with interfaces.

With these rules set up, you can repeat them at will to create DSLs of arbitrary complexity, like jOOQ. Of course, you’ll have to somehow implement all the interfaces, but that’s another story.

Here’s how the above rules are translated to Java:

// Initial interface, entry point of the DSL
// Depending on your DSL's nature, this can also be a class with static
// methods which can be static imported making your DSL even more fluent
interface Start {
  End singleWord();
  End parameterisedWord(String parameter);
  Intermediate1 word1();
  Intermediate2 word2();
  Intermediate3 word3();
}

// Terminating interface, might also contain methods like execute();
interface End {
  void end();
}

// Intermediate DSL "step" extending the interface that is returned
// by optionalWord(), to make that method "optional"
interface Intermediate1 extends End {
  End optionalWord();
}

// Intermediate DSL "step" providing several choices (similar to Start)
interface Intermediate2 {
  End wordChoiceA();
  End wordChoiceB();
}

// Intermediate interface returning itself on word3(), in order to allow
// for repetitions. Repetitions can be ended any time because this 
// interface extends End
interface Intermediate3 extends End {
  Intermediate3 word3();
}

With the above grammar defined, we can now use this DSL directly in Java. Here are all the possible constructs:

Start start = // ...

start.singleWord().end();
start.parameterisedWord("abc").end();

start.word1().end();
start.word1().optionalWord().end();

start.word2().wordChoiceA().end();
start.word2().wordChoiceB().end();

start.word3().end();
start.word3().word3().end();
start.word3().word3().word3().end();

And the best thing is, your DSL compiles directly in Java! You get a free parser. You can also re-use this DSL in Scala (or Groovy) using the same notation, or a slightly different one in Scala, omitting dots “.” and parentheses “()”:

 val start = // ...

 (start singleWord) end;
 (start parameterisedWord "abc") end;

 (start word1) end;
 ((start word1) optionalWord) end;

 ((start word2) wordChoiceA) end;
 ((start word2) wordChoiceB) end;

 (start word3) end;
 ((start word3) word3) end;
 (((start word3) word3) word3) end;

Real world examples

Some real world examples can be seen all across the jOOQ documentation and code base. Here’s an extract from a previous post of a rather complex SQL query created with jOOQ:

create().select(
    r1.ROUTINE_NAME,
    r1.SPECIFIC_NAME,
    decode()
        .when(exists(create()
            .selectOne()
            .from(PARAMETERS)
            .where(PARAMETERS.SPECIFIC_SCHEMA.equal(r1.SPECIFIC_SCHEMA))
            .and(PARAMETERS.SPECIFIC_NAME.equal(r1.SPECIFIC_NAME))
            .and(upper(PARAMETERS.PARAMETER_MODE).notEqual("IN"))),
                val("void"))
        .otherwise(r1.DATA_TYPE).as("data_type"),
    r1.NUMERIC_PRECISION,
    r1.NUMERIC_SCALE,
    r1.TYPE_UDT_NAME,
    decode().when(
    exists(
        create().selectOne()
            .from(r2)
            .where(r2.ROUTINE_SCHEMA.equal(getSchemaName()))
            .and(r2.ROUTINE_NAME.equal(r1.ROUTINE_NAME))
            .and(r2.SPECIFIC_NAME.notEqual(r1.SPECIFIC_NAME))),
        create().select(count())
            .from(r2)
            .where(r2.ROUTINE_SCHEMA.equal(getSchemaName()))
            .and(r2.ROUTINE_NAME.equal(r1.ROUTINE_NAME))
            .and(r2.SPECIFIC_NAME.lessOrEqual(r1.SPECIFIC_NAME)).asField())
    .as("overload"))
.from(r1)
.where(r1.ROUTINE_SCHEMA.equal(getSchemaName()))
.orderBy(r1.ROUTINE_NAME.asc())
.fetch()

Here’s another example from a library that looks quite appealing to me. It’s called jRTF and it’s used to create RTF documents in Java in a fluent style:

rtf()
  .header(
    color( 0xff, 0, 0 ).at( 0 ),
    color( 0, 0xff, 0 ).at( 1 ),
    color( 0, 0, 0xff ).at( 2 ),
    font( "Calibri" ).at( 0 ) )
  .section(
        p( font( 1, "Second paragraph" ) ),
        p( color( 1, "green" ) )
  )
).out( out );

Summary

Fluent APIs have been a hype for the last 7 years. Martin Fowler has become a heavily-cited man and gets most of the credits, even if fluent APIs were there before. One of Java’s oldest “fluent APIs” can be seen in java.lang.StringBuffer, which allows for appending arbitrary objects to a String. But the biggest benefit of a fluent API is its ability to easily map “external DSLs” into Java and implement them as “internal DSLs” of arbitrary complexity.