Archive | java RSS for this section

The 10 Most Annoying Things Coming Back to Java After Some Days of Scala


So, I’m experimenting with Scala because I want to write a parser, and the Scala Parsers API seems like a really good fit. After all, I can implement the parser in Scala and wrap it behind a Java interface, so apart from an additional runtime dependency, there shouldn’t be any interoperability issues.

After a few days of getting really really used to the awesomeness of Scala syntax, here are the top 10 things I’m missing the most when going back to writing Java:

1. Multiline strings

That is my personal favourite, and a really awesome feature that should be in any language. Even PHP has it: Multiline strings. As easy as writing:

println ("""Dear reader,

If we had this feature in Java,
wouldn't that be great?

Yours Sincerely,
Lukas""")

Where is this useful? With SQL, of course! Here’s how you can run a plain SQL statement with jOOQ and Scala:

println(
  DSL.using(configuration)
     .fetch("""
            SELECT a.first_name, a.last_name, b.title
            FROM author a
            JOIN book b ON a.id = b.author_id
            ORDER BY a.id, b.id
            """)
)

And this isn’t only good for static strings. With string interpolation, you can easily inject variables into such strings:

val predicate =
  if (someCondition)
    "AND a.id = 1"
  else
    ""

println(
  DSL.using(configuration)
      // Observe this little "s"
     .fetch(s"""
            SELECT a.first_name, a.last_name, b.title
            FROM author a
            JOIN book b ON a.id = b.author_id
            -- This predicate is the referencing the
            -- above Scala local variable. Neat!
            WHERE 1 = 1 $predicate
            ORDER BY a.id, b.id
            """)
)

That’s pretty awesome, isn’t it? For SQL, there is a lot of potential in Scala.

jOOQ: The Best Way to Write SQL in Scala

2. Semicolons

I sincerely haven’t missed them one bit. The way I structure code (and probably the way most people structure code), Scala seems not to need semicolons at all. In JavaScript, I wouldn’t say the same thing. The interpreted and non-typesafe nature of JavaScript seems to indicate that leaving away optional syntax elements is a guarantee to shoot yourself in the foot. But not with Scala.

val a = thisIs.soMuchBetter()
val b = no.semiColons()
val c = at.theEndOfALine()

This is probably due to Scala’s type safety, which would make the compiler complain in one of those rare ambiguous situations, but that’s just an educated guess.

3. Parentheses

This is a minefield and leaving away parentheses seems dangerous in many cases. In fact, you can also leave away the dots when calling a method:

myObject method myArgument

Because of the amount of ambiguities this can generate, especially when chaining more method calls, I think that this technique should be best avoided. But in some situations, it’s just convenient to “forget” about the parens. E.g.

val s = myObject.toString

4. Type inference

This one is really annoying in Java, and it seems that many other languages have gotten it right, in the meantime. Java only has limited type inference capabilities, and things aren’t as bright as they could be.

In Scala, I could simply write

val s = myObject.toString

… and not care about the fact that s is of type String. Sometimes, but only sometimes I like to explicitly specify the type of my reference. In that case, I can still do it:

val s : String = myObject.toString

5. Case classes

I think I’d fancy writing another POJO with 40 attributes, constructors, getters, setters, equals, hashCode, and toString

— Said no one. Ever

Scala has case classes. Simple immutable pojos written in one-liners. Take the Person case class for instance:

case class Person(firstName: String, lastName: String)

I do have to write down the attributes once, agreed. But everything else should be automatic.

And how do you create an instance of such a case class? Easily, you don’t even need the new operator (in fact, it completely escapes my imagination why new is really needed in the first place):

Person("George", "Orwell")

That’s it. What else do you want to write to be Enterprise-compliant?

Side-note

OK, some people will now argue to use project lombok. Annotation-based code generation is nonsense and should be best avoided. In fact, many annotations in the Java ecosystem are simple proof of the fact that the Java language is – and will forever be – very limited in its evolution capabilities. Take @Override for instance. This should be a keyword, not an annotation. You may think it’s a cosmetic difference, but I say that Scala has proven that annotations are pretty much always the wrong tool. Or have you seen heavily annotated Scala code, recently?

6. Methods (functions!) everywhere

This one is really one of the most useful features in any language, in my opinion. Why do we always have to link a method to a specific class? Why can’t we simply have methods in any scope level? Because we can, with Scala:

// "Top-level", i.e. associated with the package
def m1(i : Int) = i + 1

object Test {

    // "Static" method in the Test instance
    def m2(i : Int) = i + 2
    
    def main(args: Array[String]): Unit = {

        // Local method in the main method
        def m3(i : Int) = i + 3
        
        println(m1(1))
        println(m2(1))
        println(m3(1))
    }
}

Right? Why shouldn’t I be able to define a local method in another method? I can do that with classes in Java:

public void method() {
    class LocalClass {}

    System.out.println(new LocalClass());
}

A local class is an inner class that is local to a method. This is hardly ever useful, but what would be really useful is are local methods.

These are also supported in JavaScript or Ceylon, by the way.

7. The REPL

Because of various language features (such as 6. Methods everywhere), Scala is a language that can easily run in a REPL. This is awesome for testing out a small algorithm or concept outside of the scope of your application.

In Java, we usually tend to do this:

public class SomeRandomClass {

    // [...]
  
    public static void main(String[] args) {
        System.out.println(SomeOtherClass.testMethod());
    }

    // [...]
}

In Scala, I would’ve just written this in the REPL:

println(SomeOtherClass.testMethod)

Notice also the always available println method. Pure gold in terms of efficient debugging.

8. Arrays are NOT (that much of) a special case

In Java, apart from primitive types, there are also those weird things we call arrays. Arrays originate from an entirely separate universe, where we have to remember quirky rules originating from the ages of Capt Kirk (or so)

array

Yes, rules like:

// Compiles but fails at runtime
Object[] arrrrr = new String[1];
arrrrr[0] = new Object();

// This works
Object[] arrrr2 = new Integer[1];
arrrr2[0] = 1; // Autoboxing

// This doesn't work
Object[] arrrr3 = new int[];

// This works
Object[] arr4[] = new Object[1][];

// So does this (initialisation):
Object[][] arr5 = { { } };

// Or this (puzzle: Why does it work?):
Object[][] arr6 = { { new int[1] } };

// But this doesn't work (assignment)
arr5 = { { } };

Yes, the list could go on. With Scala, arrays are less of a special case, syntactically speaking:

val a = new Array[String](3);
a(0) = "A"
a(1) = "B"
a(2) = "C"
a.map(v => v + ":")

// output Array(A:, B:, C:)

As you can see, arrays behave much like other collections including all the useful methods that can be used on them.

9. Symbolic method names

Now, this topic is one that is more controversial, as it reminds us of the perils of operator overloading. But every once in a while, we’d wish to have something similar. Something that allows us to write

val x = BigDecimal(3);
val y = BigDecimal(4);
val z = x * y

Very intuitively, the value of z should be BigDecimal(12). That cannot be too hard, can it? I don’t care if the implementation of * is really a method called multiply() or whatever. When writing down the method, I’d like to use what looks like a very common operator for multiplication.

By the way, I’d also like to do that with SQL. Here’s an example:

select ( 
  AUTHOR.FIRST_NAME || " " || AUTHOR.LAST_NAME,
  AUTHOR.AGE - 10
)
from AUTHOR
where AUTHOR.ID > 10
fetch

Doesn’t that make sense? We know that || means concat (in some databases). We know what the meaning of - (minus) and > (greater than) is. Why not just write it?

The above is a compiling example of jOOQ in Scala, btw.

jOOQ: The Best Way to Write SQL in Scala

Attention: Caveat

There’s always a flip side to allowing something like operator overloading or symbolic method names. It can (and will be) abused. By libraries as much as by the Scala language itself.

10. Tuples

Being a SQL person, this is again one of the features I miss most in other languages. In SQL, everything is either a TABLE or a ROW. few people actually know that, and few databases actually support this way of thinking.

Scala doesn’t have ROW types (which are really records), but at least, there are anonymous tuple types. Think of rows as tuples with named attributes, whereas case classes would be named rows:

  • Tuple: Anonymous type with typed and indexed elements
  • Row: Anonymous type with typed, named, and indexed elements
  • case class: Named type with typed and named elements

In Scala, I can just write:

// A tuple with two values
val t1 = (1, "A")

// A nested tuple
val t2 = (1, "A", (2, "B"))

In Java, a similar thing can be done, but you’ll have to write the library yourself, and you have no language support:

class Tuple2<T1, T2> {
    // Lots of bloat, see missing case classes
}

class Tuple3<T1, T2, T3> {
    // Bloat bloat bloat
}

And then:

// Yikes, no type inference...
Tuple2<Integer, String> t1 = new Tuple2<>(1, "A");

// OK, this will certainly not look nice
Tuple3<Integer, String, Tuple2<Integer, String>> t2 =
    new Tuple3<>(1, "A", new Tuple2<>(2, "B"));

jOOQ makes extensive use of the above technique to bring you SQL’s row value expressions to Java, and surprisingly, in most cases, you can do without the missing type inference as jOOQ is a fluent API where you never really assign values to local variables… An example:

DSL.using(configuration)
   .select(T1.SOME_VALUE)
   .from(T1)
   .where(
      // This ROW constructor is completely type safe
      row(T1.COL1, T1.COL2)
      .in(select(T2.A, T2.B).from(T2))
   )
   .fetch();

Conclusion

This was certainly a pro-Scala and slightly contra-Java article. Don’t get me wrong. By no means, I’d like to migrate entirely to Scala. I think that the Scala language goes way beyond what is reasonable in any useful software. There are lots of little features and gimmicks that seem nice to have, but will inevitably blow up in your face, such as:

  • implicit conversion. This is not only very hard to manage, it also slows down compilation horribly. Besides, it’s probably utterly impossible to implement semantic versioning reasonably using implicit, as it is probably not possible to foresee all possible client code breakage through accidental backwards-incompatibility.
  • local imports seem great at first, but their power quickly makes code unintelligible when people start partially importing or renaming types for a local scope.
  • symbolic method names are most often abused. Take the parser API for instance, which features method names like ^^, ^^^, ^?, or ~!

Nonetheless, I think that the advantages of Scala over Java listed in this article could all be implemented in Java as well:

  • with little risk of breaking backwards-compatibility
  • with (probably) not too big of an effort, JLS-wise
  • with a huge impact on developer productivity
  • with a huge impact on Java’s competitiveness

In any case, Java 9 will be another promising release, with hot topics like value types, declaration-site variance, specialisation (very interesting!) or ClassDynamic

With these huge changes, let’s hope there’s also some room for any of the above little improvements, that would add more immediate value to every day work.

jOOQ Tip of the Day: Reuse Bind Values


jOOQ implements your SQL statements as AST (Abstract Syntax Tree). This means that your SQL statement is modelled in a non-text form prior to serialising it as a textual SQL statement to your JDBC driver.

One advantage of this is that you can freely manipulate this AST any way you want. This can be done using jOOQ’s SQL transformation capabilities, or in some cases much more simply directly in your client code.

Imagine, for instance, that you have a SQL statement where several bind values should be identical. This is a frequent problem in SQL, as SQL is verbose and repetitive. For instance:

-- Both "1" should in fact be the same value
SELECT 1
FROM   TABLE
WHERE  TABLE.COL < 1

-- Both "2" should in fact be the same value
SELECT 2
FROM   TABLE
WHERE  TABLE.COL < 2

With jOOQ, what you can do is this:

// Create a single bind value reference
Param<Integer> value = val(1);

// And use that reference several times in your query:
Select<?> query =
DSL.using(configuration)
   .select(value.as("a"))
   .from(TABLE)
   .where(TABLE.COL.lt(value));

assertEquals(1, (int) query.fetchOne("a"));

// Now, for the second query, simply change the value
value.setValue(2);

// ... and execute the query again
assertEquals(2, (int) query.fetchOne("a"));

As a jOOQ developer, you’re directly manipulating your SQL statement’s AST. Nothing keeps you from turning that AST into a directed graph (beware of cycles, of course), to improve your SQL expressiveness.

jOOQ: The best way to write SQL in Java

Keeping things DRY: Method overloading


A good clean application design requires discipline in keeping things DRY:

Everything has to be done once.
Having to do it twice is a coincidence.
Having to do it three times is a pattern.

— An unknown wise man

Now, if you’re following the Xtreme Programming rules, you know what needs to be done, when you encounter a pattern:

refactor mercilessly

Because we all know what happens when you don’t:

Not DRY: Method overloading

One of the least DRY things you can do that is still acceptable is method overloading – in those languages that allow it (unlike Ceylon, JavaScript). Being an internal domain-specific language, the jOOQ API makes heavy use of overloading. Consider the type Field (modelling a database column):

public interface Field<T> {

    // [...]

    Condition eq(T value);
    Condition eq(Field<T> field);
    Condition eq(Select<? extends Record1<T>> query);
    Condition eq(QuantifiedSelect<? extends Record1<T>> query);

    Condition in(Collection<?> values);
    Condition in(T... values);
    Condition in(Field<?>... values);
    Condition in(Select<? extends Record1<T>> query);

    // [...]

}

So, in certain cases, non-DRY-ness is inevitable, also to a given extent in the implementation of the above API. The key rule of thumb here, however, is to always have as few implementations as possible also for overloaded methods. Try calling one method from another. For instance these two methods are very similar:

Condition eq(T value);
Condition eq(Field<T> field);

The first method is a special case of the second one, where jOOQ users do not want to explicitly declare a bind variable. It is literally implemented as such:

@Override
public final Condition eq(T value) {
    return equal(value);
}

@Override
public final Condition equal(T value) {
    return equal(Utils.field(value, this));
}

@Override
public final Condition equal(Field<T> field) {
    return compare(EQUALS, nullSafe(field));
}

@Override
public final Condition compare(Comparator comparator, Field<T> field) {
    switch (comparator) {
        case IS_DISTINCT_FROM:
        case IS_NOT_DISTINCT_FROM:
            return new IsDistinctFrom<T>(this, nullSafe(field), comparator);

        default:
            return new CompareCondition(this, nullSafe(field), comparator);
    }
}

As you can see:

  • eq() is just a synonym for the legacy equal() method
  • equal(T) is a more specialised, convenience form of equal(Field<T>)
  • equal(Field<T>) is a more specialised, convenience form of compare(Comparator, Field<T>)
  • compare() finally provides access to the implementation of this API

All of these methods are also part of the public API and can be called by the API consumer, directly, which is why the nullSafe() check is repeated in each method.

Why all the trouble?

The answer is simple.

  • There is only very little possibility of a copy-paste error throughout all the API.
  • … because the same API has to be offered for ne, gt, ge, lt, le
  • No matter what part of the API happens to be integration-tested, the implementation itself is certainly covered by some test.
  • This way, it is extremely easy to provide users with a very rich API with lots of convenience methods, as users do not want to remember how these more general-purpose methods (like compare()) really work.

The last point is particularly important, and because of risks related to backwards-compatibility, not always followed by the JDK, for instance. In order to create a Java 8 Stream from an Iterator, you have to go through all this hassle, for instance:

// Aagh, my fingers hurt...
   StreamSupport.stream(iterator.spliterator(), false);
// ^^^^^^^^^^^^^                 ^^^^^^^^^^^    ^^^^^
//       |                            |           |
// Not Stream!                        |           |
//                                    |           |
// Hmm, Spliterator. Sounds like      |           |
// Iterator. But what is it? ---------+           |
//                                                |
// What's this true and false?                    |
// And do I need to care? ------------------------+

When, intuitively, you’d like to have:

// Not Enterprise enough
iterator.stream();

In other words, subtle Java 8 Streams implementation details will soon leak into a lot of client code, and many new utility functions will wrap these things again and again.

See Brian Goetz’s explanation on Stack Overflow for details.

On the flip side of delegating overload implementations, it is of course harder (i.e. more work) to implement such an API. This is particularly cumbersome if an API vendor also allows users to implement the API themselves (e.g. JDBC). Another issue is the length of stack traces generated by such implementations. But we’ve shown before on this blog that deep stack traces can be a sign of good quality.

Now you know why.

Takeaway

The takeaway is simple. Whenever you encounter a pattern, refactor. Find the most common denominator, factor it out into an implementation, and see that this implementation is hardly ever used by delegating single responsibility steps from method to method.

By following these rules, you will:

  • Have less bugs
  • Have a more convenient API

Happy refactoring!

Java 8 Friday: More Functional Relational Transformation


In the past, we’ve been providing you with a new article every Friday about what’s new in Java 8. It has been a very exciting blog series, but we would like to focus again more on our core content, which is Java and SQL. We will still be occasionally blogging about Java 8, but no longer every Friday (as some of you have already notice).

In this last, short post of the Java 8 Friday series, we’d like to re-iterate the fact that we believe that the future belongs to functional relational data transformation (as opposed to ORM). We’ve spent about 20 years now using the object-oriented software development paradigm. Many of us have been very dogmatic about it. In the last 10 years, however, a “new” paradigm has started to get increasing traction in programming communities: Functional programming.

Functional programming is not that new, however. Lisp has been a very early functional programming language. XSLT and SQL are also somewhat functional (and declarative!). As we’re big fans of SQL’s functional (and declarative!) nature, we’re quite excited about the fact that we now have sophisticated tools in Java to transform tabular data that has been extracted from SQL databases. Streams!

SQL ResultSets are very similar to Streams

As we’ve pointed out before, JDBC ResultSets and Java 8 Streams are quite similar. This is even more true when you’re using jOOQ, which replaces the JDBC ResultSet by an org.jooq.Result, which extends java.util.List, and thus automatically inherits all Streams functionality. Consider the following query that allows fetching a one-to-many relationship between BOOK and AUTHOR records:

Map<Record2<String, String>, 
    List<Record2<Integer, String>>> booksByAuthor =

// This work is performed in the database
// --------------------------------------
ctx.select(
        BOOK.ID,
        BOOK.TITLE,
        AUTHOR.FIRST_NAME,
        AUTHOR.LAST_NAME
    )
   .from(BOOK)
   .join(AUTHOR)
   .on(BOOK.AUTHOR_ID.eq(AUTHOR.ID))
   .orderBy(BOOK.ID)
   .fetch()

// This work is performed in Java memory
// -------------------------------------
   .stream()

   // Group BOOKs by AUTHOR
   .collect(groupingBy(

        // This is the grouping key      
        r -> r.into(AUTHOR.FIRST_NAME, 
                    AUTHOR.LAST_NAME),

        // This is the target data structure
        LinkedHashMap::new,

        // This is the value to be produced for each
        // group: A list of BOOK
        mapping(
            r -> r.into(BOOK.ID, BOOK.TITLE),
            toList()
        )
    ));

The fluency of the Java 8 Streams API is very idiomatic to someone who has been used to writing SQL with jOOQ. Obviously, you can also use something other than jOOQ, e.g. Spring’s JdbcTemplate, or Apache Commons DbUtils, or just wrap the JDBC ResultSet in an Iterator…

What’s very nice about this approach, compared to ORM is the fact that there is no magic happening at all. Every piece of mapping logic is explicit and, thanks to Java generics, fully typesafe. The type of the booksByAuthor output is complex, and a bit hard to read / write, in this example, but it is also fully descriptive and useful.

The same functional transformation with POJOs

If you aren’t too happy with using jOOQ’s Record2 tuple types, no problem. You can specify your own data transfer objects like so:

class Book {
    public int id;
    public String title;

    @Override
    public String toString() { ... }

    @Override
    public int hashCode() { ... }

    @Override
    public boolean equals(Object obj) { ... }
}

static class Author {
    public String firstName;
    public String lastName;

    @Override
    public String toString() { ... }

    @Override
    public int hashCode() { ... }

    @Override
    public boolean equals(Object obj) { ... }
}

With the above DTO, you can now leverage jOOQ’s built-in POJO mapping to transform the jOOQ records into your own domain classes:

Map<Author, List<Book>> booksByAuthor =
ctx.select(
        BOOK.ID,
        BOOK.TITLE,
        AUTHOR.FIRST_NAME,
        AUTHOR.LAST_NAME
    )
   .from(BOOK)
   .join(AUTHOR)
   .on(BOOK.AUTHOR_ID.eq(AUTHOR.ID))
   .orderBy(BOOK.ID)
   .fetch()
   .stream()
   .collect(groupingBy(

        // This is the grouping key      
        r -> r.into(Author.class),
        LinkedHashMap::new,

        // This is the grouping value list
        mapping(
            r -> r.into(Book.class),
            toList()
        )
    ));

Explicitness vs. implicitness

At Data Geekery, we believe that a new time has started for Java developers. A time where Annotatiomania™ (finally!) ends and people stop assuming all that implicit behaviour through annotation magic. ORMs depend on a huge amount of specification to explain how each annotation works with each other annotation. It is hard to reverse-engineer (or debug!) this kind of not-so-well-understood annotation-language that JPA has brought to us.

On the flip side, SQL is pretty well understood. Tables are an easy-to-handle data structure, and if you need to transform those tables into something more object-oriented, or more hierarchically structured, you can simply apply functions to those tables and group values yourself! By grouping those values explicitly, you stay in full control of your mapping, just as with jOOQ, you stay in full control of your SQL.

This is why we believe that in the next 5 years, ORMs will lose relevance and people start embracing explicit, stateless and magicless data transformation techniques again, using Java 8 Streams.

PostgreSQL’s Table-Valued Functions


Table-valued functions are an awesome thing. Many databases support them in one way or another and so does PostgreSQL. In PostgreSQL, (almost) everything is a table. For instance, we can write:

CREATE OR REPLACE FUNCTION 
    f_1 (v1 INTEGER, v2 OUT INTEGER)
AS $$
BEGIN
    v2 := v1;
END
$$ LANGUAGE plpgsql;

… and believe it or not, this is a table! We can write:

select * from f_1(1);

And the above will return:

+----+
| v2 |
+----+
|  1 |
+----+

It’s kind of intuitive if you think about it. We’re just pushing out a single record with a single column. If we wanted two columns, we could’ve written:

CREATE OR REPLACE FUNCTION 
    f_2 (v1 INTEGER, v2 OUT INTEGER, v3 OUT INTEGER)
AS $$
BEGIN
    v2 := v1;
    v3 := v1 + 1;
END
$$ LANGUAGE plpgsql;

… and then:

select * from f_2(1);

And the above will return:

+----+----+
| v2 | v3 |
+----+----+
|  1 |  2 |
+----+----+

That’s useful, but those are just single records. What if we wanted to produce a whole table? It’s easy, just change your functions to actually return TABLE types, instead of using OUT parameters:

CREATE OR REPLACE FUNCTION f_3 (v1 INTEGER)
RETURNS TABLE(v2 INTEGER, v3 INTEGER)
AS $$
BEGIN
    RETURN QUERY
    SELECT *
    FROM (
        VALUES(v1, v1 + 1), 
              (v1 * 2, (v1 + 1) * 2)
    ) t(a, b);
END
$$ LANGUAGE plpgsql;

When selecting from the above very useful function, we’ll get a table like so:

select * from f_3(1);

And the above will return:

+----+----+
| v2 | v3 |
+----+----+
|  1 |  2 |
|  2 |  4 |
+----+----+

And we can LATERAL join that function to other tables if we want:

select *
from book, lateral f_3(book.id)

… which might yield, for example:

+----+--------------+----+----+
| id | title        | v2 | v3 |
+----+--------------+----+----+
|  1 | 1984         |  1 |  2 |
|  1 | 1984         |  2 |  4 |
|  2 | Animal Farm  |  2 |  4 |
|  2 | Animal Farm  |  4 |  6 |
+----+--------------+----+----+

In fact, it appears that the keyword LATERAL is optional in this case, at least for PostgreSQL.

Table-valued functions are very powerful!

Discovering table-valued functions

From jOOQ’s schema reverse-engineering perspective, things might get a bit tricky as can be seen in this Stack Overflow question. PostgreSQL deals with OUT parameters in a very similar way as with TABLE return types. This can be seen in the following query against the INFORMATION_SCHEMA:

SELECT r.routine_name, r.data_type, p.parameter_name, p.data_type
FROM   information_schema.routines r
JOIN   information_schema.parameters p
USING (specific_catalog, specific_schema, specific_name);

… and the output:

routine_name | data_type | parameter_name | data_type
-------------+-----------+----------------+----------
f_1          | integer   | v1             | integer
f_1          | integer   | v2             | integer
f_2          | record    | v1             | integer
f_2          | record    | v2             | integer
f_2          | record    | v3             | integer
f_3          | record    | v1             | integer
f_3          | record    | v2             | integer
f_3          | record    | v3             | integer

As you can see, the output is really indistinguishable from that perspective. Luckily, we can also join the pg_catalog.pg_proc table, which contains the relevant flag to indicate if a function returns a set or not:

SELECT   r.routine_name, 
         r.data_type, 
         p.parameter_name, 
         p.data_type, 
         pg_p.proretset
FROM     information_schema.routines r
JOIN     information_schema.parameters p
USING   (specific_catalog, specific_schema, specific_name)
JOIN     pg_namespace pg_n
ON       r.specific_schema = pg_n.nspname
JOIN     pg_proc pg_p
ON       pg_p.pronamespace = pg_n.oid
AND      pg_p.proname = r.routine_name
ORDER BY routine_name, parameter_name;

Now, we’re getting:

routine_name | data_type | parameter_name | data_type | proretset
-------------+-----------+----------------+-----------+----------
f_1          | integer   | v1             | integer   | f
f_1          | integer   | v2             | integer   | f
f_2          | record    | v1             | integer   | f
f_2          | record    | v2             | integer   | f
f_2          | record    | v3             | integer   | f
f_3          | record    | v1             | integer   | t
f_3          | record    | v2             | integer   | t
f_3          | record    | v3             | integer   | t

We can see that f_3 is the only function actually returning a set of record, unlike f_1 and f_2, which only return a single record.

Now, remove all those parameters that are not OUT parameters, and you have your table type:

SELECT   r.routine_name, 
         p.parameter_name,
         p.data_type,
         row_number() OVER (
           PARTITION BY r.specific_name 
           ORDER BY p.ordinal_position
         ) AS ordinal_position
FROM     information_schema.routines r
JOIN     information_schema.parameters p
USING   (specific_catalog, specific_schema, specific_name)
JOIN     pg_namespace pg_n
ON       r.specific_schema = pg_n.nspname
JOIN     pg_proc pg_p
ON       pg_p.pronamespace = pg_n.oid
AND      pg_p.proname = r.routine_name
WHERE    pg_p.proretset
AND      p.parameter_mode = 'OUT'
ORDER BY routine_name, parameter_name;

Which will give us:

routine_name | parameter_name | data_type | position |
-------------+----------------+-----------+----------+
f_3          | v2             | integer   |        1 |
f_3          | v3             | integer   |        2 |

How to run such queries in jOOQ?

Once the above code is generated, you can easily call the table-valued function in any jOOQ query. Consider again the BOOK example (in SQL):

select *
from book, lateral f_3(book.id)

… and with jOOQ:

DSL.using(configuration)
   .select()
   .from(BOOK, lateral(F_3.call(BOOK.ID)))
   .fetch();

The returned records then contain values for:

record.getValue(F_3.V2);
record.getValue(F_3.V3);

All that typesafety is only available in the upcoming jOOQ 3.5, for free! (SQL Server, Oracle, and HSQLDB table-valued functions are already supported!)

jOOQ is the best way to write SQL in Java

Stop Unit Testing Database Code


Writing tests that use an actual database is hard.

Period.

Now that this has been established, let’s have a look at a blog post by Marco Behler, in which he elaborates on various options when testing database code, with respect to transactionality. Testing database transactions is even harder than just testing database code. Marco lists a couple of options how to tweak these tests to make them “easier” to write.

One of the options is:

3. Set the flush-mode to FlushMode.ALWAYS for your tests

(note, this is about testing Hibernate code).

Marco puts this option in parentheses, because he’s not 100% convinced if it’s a good idea to test different behaviour from the productive one. Here’s our take on that:

Stop Unit Testing Database Code

By the time you start thinking about tweaking your test setup to achieve “simpler” transaction behaviour (or worse, use mocks for your database), you’re pretty much doomed. You start creating an alternative system that heavily deviates from your productive system. This essentially means:

  • that results from tests against your test system have (almost) no meaning
  • that your tests will not cover some of the most complex aspects of your productive system
  • that you will start spending way too much time on tweaking tests rather than implementing useful business logic

Instead, focus on writing integration tests that test your business logic on a very high level, i.e. on a “service layer” level, if your architecture has such a thing. If you’re using EJB, this would probably be on a session bean level. If you’re using REST or SOAP, this would be on a REST or SOAP service level. If you’re using HTTP (the retro name for REST), it would be on an HTTP service level.

Here are the advantages of this approach:

  • You might not get 100% coverage, but you will get the 80% coverage for those parts that really matter (with 20% of the effort). Your UI (or external system) doesn’t call the database in the quirkiest of forms either. It calls your services. Why would you test anything other than your services?
  • Your tests are very easy to maintain. You have a public API to your UI (or external system). It’s formal and easy to understand. It’s a black box with well-defined input and output parameters, which makes it easy to read / write tests

Databases are stateful. Obviously

What you have to do is let go of this idea that your database will ever participate in a “unit” (as in “unit” test). Units are pretty stateless, and thus it is very easy to write mutually independent unit tests for functional algorithms, for instance.

Databases couldn’t be any less stateless. The whole idea of a database is to manage state. And that’s very complicated and completely opposite to what any unit test can ever model. Many of the most meaningful state transitions span several database interactions, or even transactions, or maybe even services. For instance, it may be important that the CREATE_USER service invocation be immediately followed by an invocation of CHANGE_PASSWORD. You can only integration-test that on a service layer. Don’t believe it? What if CREATE_USER depends on an external LDAP system? Or complex security logic in stored procedures? Your integration test’s got that covered.

Takeaway

Writing tests that use an actual database is hard.

Yes. That won’t change. But your perception may. Don’t try to tweak things around this fact. Create a well-known test database. Reset it between tests. And write integration tests on a very high level. The 20/80 cost/benefit ratio will leave you no better choice.

Stay tuned for another blog post on this blog about how we integration-test the jOOQ API against 16 actual RDBMS

Flyway and jOOQ for Unbeatable SQL Development Productivity


When performing database migrations, we at Data Geekery recommend using jOOQ with Flyway – Database Migrations Made Easy. In this post, we’re going to look into a simple way to get started with the two frameworks.

Philosophy

There are a variety of ways how jOOQ and Flyway could interact with each other in various development setups. In this tutorial we’re going to show just one variant of such framework team play – a variant that we find particularly compelling for most use cases.

The general philosophy and workflow behind the following approach can be summarised as this:

  • 1. Database increment
  • 2. Database migration
  • 3. Code re-generation
  • 4. Development

The four steps above can be repeated time and again, every time you need to modify something in your database. More concretely, let’s consider:

  • 1. Database increment – You need a new column in your database, so you write the necessary DDL in a Flyway script
  • 2. Database migration – This Flyway script is now part of your deliverable, which you can share with all developers who can migrate their databases with it, the next time they check out your change
  • 3. Code re-generation – Once the database is migrated, you regenerate all jOOQ artefacts (see code generation), locally
  • 4. Development – You continue developing your business logic, writing code against the udpated, generated database schema

0.1. Maven Project Configuration – Properties

The following properties are defined in our pom.xml, to be able to reuse them between plugin configurations:

<properties>
    <db.url>jdbc:h2:~/flyway-test</db.url>
    <db.username>sa</db.username>
</properties>

0.2. Maven Project Configuration – Dependencies

While jOOQ and Flyway could be used in standalone migration scripts, in this tutorial, we’ll be using Maven for the standard project setup. You will also find the source code of this tutorial on GitHub, and the full pom.xml file here.

These are the dependencies that we’re using in our Maven configuration:

<!-- We'll add the latest version of jOOQ 
     and our JDBC driver - in this case H2 -->
<dependency>
    <groupId>org.jooq</groupId>
    <artifactId>jooq</artifactId>
    <version>3.4.0</version>
</dependency>
<dependency>
    <groupId>com.h2database</groupId>
    <artifactId>h2</artifactId>
    <version>1.4.177</version>
</dependency>

<!-- For improved logging, we'll be using 
     log4j via slf4j to see what's going
     on during migration and code generation -->
<dependency>
    <groupId>log4j</groupId>
    <artifactId>log4j</artifactId>
    <version>1.2.16</version>
</dependency>
<dependency>
    <groupId>org.slf4j</groupId>
    <artifactId>slf4j-log4j12</artifactId>
    <version>1.7.5</version>
</dependency>

<!-- To esnure our code is working, we're
     using JUnit -->
<dependency>
    <groupId>junit</groupId>
    <artifactId>junit</artifactId>
    <version>4.11</version>
    <scope>test</scope>
</dependency>

0.3. Maven Project Configuration – Plugins

After the dependencies, let’s simply add the Flyway and jOOQ Maven plugins like so. The Flyway plugin:

<plugin>
    <groupId>org.flywaydb</groupId>
    <artifactId>flyway-maven-plugin</artifactId>
    <version>3.0</version>

    <!-- Note that we're executing the Flyway
         plugin in the "generate-sources" phase -->
    <executions>
        <execution>
            <phase>generate-sources</phase>
            <goals>
                <goal>migrate</goal>
            </goals>
        </execution>
    </executions>

    <!-- Note that we need to prefix the db/migration
         path with filesystem: to prevent Flyway
         from looking for our migration scripts
         only on the classpath -->
    <configuration>
        <url>${db.url}</url>
        <user>${db.username}</user>
        <locations>
            <location>filesystem:src/main/resources/db/migration</location>
        </locations>
    </configuration>
</plugin>

The above Flyway Maven plugin configuration will read and execute all database migration scripts from src/main/resources/db/migrationprior to compiling Java source code. While the official Flyway documentation suggests that migrations be done in the compile phase, the jOOQ code generator relies on such migrations having been done prior to code generation.

After the Flyway plugin, we’ll add the jOOQ Maven Plugin. For more details, please refer to the manual’s section about the code generation configuration.

<plugin>
    <groupId>org.jooq</groupId>
    <artifactId>jooq-codegen-maven</artifactId>
    <version>${org.jooq.version}</version>

    <!-- The jOOQ code generation plugin is also 
         executed in the generate-sources phase,
         prior to compilation -->
    <executions>
        <execution>
            <phase>generate-sources</phase>
            <goals>
                <goal>generate</goal>
            </goals>
        </execution>
    </executions>

    <!-- This is a minimal working configuration. 
         See the manual's section about the code
         generator for more details -->
    <configuration>
        <jdbc>
            <url>${db.url}</url>
            <user>${db.username}</user>
        </jdbc>
        <generator>
            <database>
                <includes>.*</includes>
                <inputSchema>FLYWAY_TEST</inputSchema>
            </database>
            <target>
                <packageName>org.jooq.example.flyway.db.h2</packageName>
                <directory>target/generated-sources/jooq-h2</directory>
            </target>
        </generator>
    </configuration>
</plugin>

This configuration will now read the FLYWAY_TEST schema and reverse-engineer it into the target/generated-sources/jooq-h2 directory, and within that, into the org.jooq.example.flyway.db.h2 package.

1. Database increments

Now, when we start developing our database. For that, we’ll create database increment scripts, which we put into thesrc/main/resources/db/migration directory, as previously configured for the Flyway plugin. We’ll add these files:

  • V1__initialise_database.sql
  • V2__create_author_table.sql
  • V3__create_book_table_and_records.sql

These three scripts model our schema versions 1-3 (note the capital V!). Here are the scripts’ contents

-- V1__initialise_database.sql
DROP SCHEMA flyway_test IF EXISTS;

CREATE SCHEMA flyway_test;
-- V2__create_author_table.sql
CREATE SEQUENCE flyway_test.s_author_id START WITH 1;

CREATE TABLE flyway_test.author (
  id INT NOT NULL,
  first_name VARCHAR(50),
  last_name VARCHAR(50) NOT NULL,
  date_of_birth DATE,
  year_of_birth INT,
  address VARCHAR(50),

  CONSTRAINT pk_t_author PRIMARY KEY (ID)
);
-- V3__create_book_table_and_records.sql
CREATE TABLE flyway_test.book (
  id INT NOT NULL,
  author_id INT NOT NULL,
  title VARCHAR(400) NOT NULL,

  CONSTRAINT pk_t_book PRIMARY KEY (id),
  CONSTRAINT fk_t_book_author_id FOREIGN KEY (author_id) REFERENCES flyway_test.author(id)
);


INSERT INTO flyway_test.author VALUES (next value for flyway_test.s_author_id, 'George', 'Orwell', '1903-06-25', 1903, null);
INSERT INTO flyway_test.author VALUES (next value for flyway_test.s_author_id, 'Paulo', 'Coelho', '1947-08-24', 1947, null);

INSERT INTO flyway_test.book VALUES (1, 1, '1984');
INSERT INTO flyway_test.book VALUES (2, 1, 'Animal Farm');
INSERT INTO flyway_test.book VALUES (3, 2, 'O Alquimista');
INSERT INTO flyway_test.book VALUES (4, 2, 'Brida');

2. Database migration and 3. Code regeneration

The above three scripts are picked up by Flyway and executed in the order of the versions. This can be seen very simply by executing:

mvn clean install

And then observing the log output from Flyway…

[INFO] --- flyway-maven-plugin:3.0:migrate (default) @ jooq-flyway-example ---
[INFO] Database: jdbc:h2:~/flyway-test (H2 1.4)
[INFO] Validated 3 migrations (execution time 00:00.004s)
[INFO] Creating Metadata table: "PUBLIC"."schema_version"
[INFO] Current version of schema "PUBLIC": <>
[INFO] Migrating schema "PUBLIC" to version 1
[INFO] Migrating schema "PUBLIC" to version 2
[INFO] Migrating schema "PUBLIC" to version 3
[INFO] Successfully applied 3 migrations to schema "PUBLIC" (execution time 00:00.073s).

… and from jOOQ on the console:

[INFO] --- jooq-codegen-maven:3.5.0-SNAPSHOT:generate (default) @ jooq-flyway-example ---
[INFO] Using this configuration:
...
[INFO] Generating schemata      : Total: 1
[INFO] Generating schema        : FlywayTest.java
[INFO] ----------------------------------------------------------
[....]
[INFO] GENERATION FINISHED!     : Total: 337.576ms, +4.299ms

4. Development

Note that all of the previous steps are executed automatically, every time someone adds new migration scripts to the Maven module. For instance, a team member might have committed a new migration script, you check it out, rebuild and get the latest jOOQ-generated sources for your own development or integration-test database.

Now, that these steps are done, you can proceed writing your database queries. Imagine the following test case

import org.jooq.Result;
import org.jooq.impl.DSL;
import org.junit.Test;

import java.sql.DriverManager;

import static java.util.Arrays.asList;
import static org.jooq.example.flyway.db.h2.Tables.*;
import static org.junit.Assert.assertEquals;

public class AfterMigrationTest {

    @Test
    public void testQueryingAfterMigration() throws Exception {
        try (Connection c = DriverManager.getConnection("jdbc:h2:~/flyway-test", "sa", "")) {
            Result<?> result =
            DSL.using(c)
               .select(
                   AUTHOR.FIRST_NAME,
                   AUTHOR.LAST_NAME,
                   BOOK.ID,
                   BOOK.TITLE
               )
               .from(AUTHOR)
               .join(BOOK)
               .on(AUTHOR.ID.eq(BOOK.AUTHOR_ID))
               .orderBy(BOOK.ID.asc())
               .fetch();

            assertEquals(4, result.size());
            assertEquals(asList(1, 2, 3, 4), result.getValues(BOOK.ID));
        }
    }
}

If you run the mvn clean install again, the above integration test will now compile and pass!

Reiterate

The power of this approach becomes clear once you start performing database modifications this way. Let’s assume that the French guy on our team prefers to have things his way (no offense intended ;-) ):

-- V4__le_french.sql
ALTER TABLE flyway_test.book 
  ALTER COLUMN title RENAME TO le_titre;

They check it in, you check out the new database migration script, run

mvn clean install

And then observe the log output:

[INFO] --- flyway-maven-plugin:3.0:migrate (default) @ jooq-flyway-example ---
[INFO] --- flyway-maven-plugin:3.0:migrate (default) @ jooq-flyway-example ---
[INFO] Database: jdbc:h2:~/flyway-test (H2 1.4)
[INFO] Validated 4 migrations (execution time 00:00.005s)
[INFO] Current version of schema "PUBLIC": 3
[INFO] Migrating schema "PUBLIC" to version 4
[INFO] Successfully applied 1 migration to schema "PUBLIC" (execution time 00:00.016s).

So far so good, but later on:

[ERROR] COMPILATION ERROR :
[INFO] -------------------------------------------------------------
[ERROR] C:\...\AfterMigrationTest.java:[24,19] error: cannot find symbol
[INFO] 1 error

When we go back to our Java integration test, we can immediately see that the TITLE column is still being referenced, but it no longer exists:

public class AfterMigrationTest {

    @Test
    public void testQueryingAfterMigration() throws Exception {
        try (Connection c = DriverManager.getConnection("jdbc:h2:~/flyway-test", "sa", "")) {
            Result<?> result =
            DSL.using(c)
               .select(
                   AUTHOR.FIRST_NAME,
                   AUTHOR.LAST_NAME,
                   BOOK.ID,
                   BOOK.TITLE
                   //   ^^^^^ This column no longer exists. 
                   //   We'll have to rename it to LE_TITRE
               )
               .from(AUTHOR)
               .join(BOOK)
               .on(AUTHOR.ID.eq(BOOK.AUTHOR_ID))
               .orderBy(BOOK.ID.asc())
               .fetch();

            assertEquals(4, result.size());
            assertEquals(asList(1, 2, 3, 4), result.getValues(BOOK.ID));
        }
    }
}

Conclusion

This tutorial shows very easily how you can build a rock-solid development process using Flyway and jOOQ to prevent SQL-related errors very early in your development lifecycle – immediately at compile time, rather than in production!

Visit the Flyway website and the jOOQ website.

Java 8 Friday: The Best Java 8 Resources – Your Weekend is Booked


At Data Geekery, we love Java. And as we’re really into jOOQ’s fluent API and query DSL, we’re absolutely thrilled about what Java 8 will bring to our ecosystem.

Every Friday, we’re showing you a couple of nice new tutorial-style Java 8 features, which take advantage of lambda expressions, method references, default methods, the Streams API, and other great stuff. You’ll find the source code on GitHub.

The Best Java 8 Resources – Your Weekend is Booked

We’re obviously not the only ones writing about Java 8. Ever since this great language update’s go live, there had been blogs all around the world appearing with great content and different perspectives on the subject. In this edition of the Java 8 Friday series, we’d like to summarise some of the best content that has been going on on that subject.

1. Brian Goetz’s Answers on Stack Overflow

Brian Goetz was the spec lead for JSR 335. Together with his Expert Group team, he has worked very hard to help Java 8 succeed. However, now that JSR 335 has shipped, his work is far from being over. Brian has had the courtesy of giving authoritative answers to questions from the Java community on Stack Overflow. Here are some of the most interesting questions:

Thumbs up to this great community effort. It cannot get any better than hearing authoritative answers from the spec lead himself.

2. Baeldung.com’s Collection of Java 8 Resources

This list of resources wouldn’t be complete without the very useful list of Java 8 resources (mostly authoritative links to specifications) from the guys over at Baeldung.com. Here is:

http://www.baeldung.com/java8

3. The jOOQ Blog’s Java 8 Friday Series

Yay, that’s us! :-)

Yes, we’ve worked hard to bring you the latest from our experience when integrating jOOQ with Java 8. Here are some of our most popular articles from the recent months:

4. ZeroTurnaround’s RebelLabs Blog

As part of the ZeroTurnaround content marketing strategy, ZeroTurnaround has launched RebelLabs quite a while ago where various writers publish interesting articles around the topic of Java, which aren’t necessarily related to JRebel and other ZT products. There is some great Java 8 related content having been published over there. Here are our favourite gems:

5. The Takipi Blog

Just like ZeroTurnaround and ourselves, our friends over at Takipi provide you with some awesome Java 8 content on their blog.

6. Benji Weber’s Fun Experiments with Java 8

This blog series we found particularly fun to read. Benji Weber really thinks outside of the box and does some crazy things with default methods, method references and all that. Things that Java developers could only dream of, so far. Here are:

7. The Geeks from Paradise Blog’s Java 8 Musings

Edwin Dalorzo from Informatech has been treating us with a variety of well-founded comparisons between Java 8 and .NET. This is particularly interesting when comparing Streams with LINQ. Here are some of his best writings:

Is this list complete?

No, it is missing many other, very interesting blog series. Do you have a series to share? We’re more than happy to update this post, just let us know (in the comments section)

Java 8 Friday: 10 Subtle Mistakes When Using the Streams API


At Data Geekery, we love Java. And as we’re really into jOOQ’s fluent API and query DSL, we’re absolutely thrilled about what Java 8 will bring to our ecosystem.

Java 8 Friday

Every Friday, we’re showing you a couple of nice new tutorial-style Java 8 features, which take advantage of lambda expressions, extension methods, and other great stuff. You’ll find the source code on GitHub.

10 Subtle Mistakes When Using the Streams API

We’ve done all the SQL mistakes lists:

But we haven’t done a top 10 mistakes list with Java 8 yet! For today’s occasion (it’s Friday the 13th), we’ll catch up with what will go wrong in YOUR application when you’re working with Java 8. (it won’t happen to us, as we’re stuck with Java 6 for another while)

1. Accidentally reusing streams

Wanna bet, this will happen to everyone at least once. Like the existing “streams” (e.g. InputStream), you can consume streams only once. The following code won’t work:

IntStream stream = IntStream.of(1, 2);
stream.forEach(System.out::println);

// That was fun! Let's do it again!
stream.forEach(System.out::println);

You’ll get a

java.lang.IllegalStateException: 
  stream has already been operated upon or closed

So be careful when consuming your stream. It can be done only once

2. Accidentally creating “infinite” streams

You can create infinite streams quite easily without noticing. Take the following example:

// Will run indefinitely
IntStream.iterate(0, i -> i + 1)
         .forEach(System.out::println);

The whole point of streams is the fact that they can be infinite, if you design them to be. The only problem is, that you might not have wanted that. So, be sure to always put proper limits:

// That's better
IntStream.iterate(0, i -> i + 1)
         .limit(10)
         .forEach(System.out::println);

3. Accidentally creating “subtle” infinite streams

We can’t say this enough. You WILL eventually create an infinite stream, accidentally. Take the following stream, for instance:

IntStream.iterate(0, i -> ( i + 1 ) % 2)
         .distinct()
         .limit(10)
         .forEach(System.out::println);

So…

  • we generate alternating 0’s and 1’s
  • then we keep only distinct values, i.e. a single 0 and a single 1
  • then we limit the stream to a size of 10
  • then we consume it

Well… the distinct() operation doesn’t know that the function supplied to the iterate() method will produce only two distinct values. It might expect more than that. So it’ll forever consume new values from the stream, and the limit(10) will never be reached. Tough luck, your application stalls.

4. Accidentally creating “subtle” parallel infinite streams

We really need to insist that you might accidentally try to consume an infinite stream. Let’s assume you believe that the distinct() operation should be performed in parallel. You might be writing this:

IntStream.iterate(0, i -> ( i + 1 ) % 2)
         .parallel()
         .distinct()
         .limit(10)
         .forEach(System.out::println);

Now, we’ve already seen that this will turn forever. But previously, at least, you only consumed one CPU on your machine. Now, you’ll probably consume four of them, potentially occupying pretty much all of your system with an accidental infinite stream consumption. That’s pretty bad. You can probably hard-reboot your server / development machine after that. Have a last look at what my laptop looked like prior to exploding:

If I were a laptop, this is how I'd like to go.

If I were a laptop, this is how I’d like to go.

5. Mixing up the order of operations

So, why did we insist on your definitely accidentally creating infinite streams? It’s simple. Because you may just accidentally do it. The above stream can be perfectly consumed if you switch the order of limit() and distinct():

IntStream.iterate(0, i -> ( i + 1 ) % 2)
         .limit(10)
         .distinct()
         .forEach(System.out::println);

This now yields:

0
1

Why? Because we first limit the infinite stream to 10 values (0 1 0 1 0 1 0 1 0 1), before we reduce the limited stream to the distinct values contained in it (0 1).

Of course, this may no longer be semantically correct, because you really wanted the first 10 distinct values from a set of data (you just happened to have “forgotten” that the data is infinite). No one really wants 10 random values, and only then reduce them to be distinct.

If you’re coming from a SQL background, you might not expect such differences. Take SQL Server 2012, for instance. The following two SQL statements are the same:

-- Using TOP
SELECT DISTINCT TOP 10 *
FROM i
ORDER BY ..

-- Using FETCH
SELECT *
FROM i
ORDER BY ..
OFFSET 0 ROWS
FETCH NEXT 10 ROWS ONLY

So, as a SQL person, you might not be as aware of the importance of the order of streams operations.

jOOQ, the best way to write SQL in Java

6. Mixing up the order of operations (again)

Speaking of SQL, if you’re a MySQL or PostgreSQL person, you might be used to the LIMIT .. OFFSET clause. SQL is full of subtle quirks, and this is one of them. The OFFSET clause is applied FIRST, as suggested in SQL Server 2012’s (i.e. the SQL:2008 standard’s) syntax.

If you translate MySQL / PostgreSQL’s dialect directly to streams, you’ll probably get it wrong:

IntStream.iterate(0, i -> i + 1)
         .limit(10) // LIMIT
         .skip(5)   // OFFSET
         .forEach(System.out::println);

The above yields

5
6
7
8
9

Yes. It doesn’t continue after 9, because the limit() is now applied first, producing (0 1 2 3 4 5 6 7 8 9). skip() is applied after, reducing the stream to (5 6 7 8 9). Not what you may have intended.

BEWARE of the LIMIT .. OFFSET vs. "OFFSET .. LIMIT" trap!

7. Walking the file system with filters

We’ve blogged about this before. What appears to be a good idea is to walk the file system using filters:

Files.walk(Paths.get("."))
     .filter(p -> !p.toFile().getName().startsWith("."))
     .forEach(System.out::println);

The above stream appears to be walking only through non-hidden directories, i.e. directories that do not start with a dot. Unfortunately, you’ve again made mistake #5 and #6. walk() has already produced the whole stream of subdirectories of the current directory. Lazily, though, but logically containing all sub-paths. Now, the filter will correctly filter out paths whose names start with a dot “.”. E.g. .git or .idea will not be part of the resulting stream. But these paths will be: .\.git\refs, or .\.idea\libraries. Not what you intended.

Now, don’t fix this by writing the following:

Files.walk(Paths.get("."))
     .filter(p -> !p.toString().contains(File.separator + "."))
     .forEach(System.out::println);

While that will produce the correct output, it will still do so by traversing the complete directory subtree, recursing into all subdirectories of “hidden” directories.

I guess you’ll have to resort to good old JDK 1.0 File.list() again. The good news is, FilenameFilter and FileFilter are both functional interfaces.

8. Modifying the backing collection of a stream

While you’re iterating a List, you must not modify that same list in the iteration body. That was true before Java 8, but it might become more tricky with Java 8 streams. Consider the following list from 0..9:

// Of course, we create this list using streams:
List<Integer> list = 
IntStream.range(0, 10)
         .boxed()
         .collect(toCollection(ArrayList::new));

Now, let’s assume that we want to remove each element while consuming it:

list.stream()
    // remove(Object), not remove(int)!
    .peek(list::remove)
    .forEach(System.out::println);

Interestingly enough, this will work for some of the elements! The output you might get is this one:

0
2
4
6
8
null
null
null
null
null
java.util.ConcurrentModificationException

If we introspect the list after catching that exception, there’s a funny finding. We’ll get:

[1, 3, 5, 7, 9]

Heh, it “worked” for all the odd numbers. Is this a bug? No, it looks like a feature. If you’re delving into the JDK code, you’ll find this comment in ArrayList.ArraListSpliterator:

/*
 * If ArrayLists were immutable, or structurally immutable (no
 * adds, removes, etc), we could implement their spliterators
 * with Arrays.spliterator. Instead we detect as much
 * interference during traversal as practical without
 * sacrificing much performance. We rely primarily on
 * modCounts. These are not guaranteed to detect concurrency
 * violations, and are sometimes overly conservative about
 * within-thread interference, but detect enough problems to
 * be worthwhile in practice. To carry this out, we (1) lazily
 * initialize fence and expectedModCount until the latest
 * point that we need to commit to the state we are checking
 * against; thus improving precision.  (This doesn't apply to
 * SubLists, that create spliterators with current non-lazy
 * values).  (2) We perform only a single
 * ConcurrentModificationException check at the end of forEach
 * (the most performance-sensitive method). When using forEach
 * (as opposed to iterators), we can normally only detect
 * interference after actions, not before. Further
 * CME-triggering checks apply to all other possible
 * violations of assumptions for example null or too-small
 * elementData array given its size(), that could only have
 * occurred due to interference.  This allows the inner loop
 * of forEach to run without any further checks, and
 * simplifies lambda-resolution. While this does entail a
 * number of checks, note that in the common case of
 * list.stream().forEach(a), no checks or other computation
 * occur anywhere other than inside forEach itself.  The other
 * less-often-used methods cannot take advantage of most of
 * these streamlinings.
 */

Now, check out what happens when we tell the stream to produce sorted() results:

list.stream()
    .sorted()
    .peek(list::remove)
    .forEach(System.out::println);

This will now produce the following, “expected” output

0
1
2
3
4
5
6
7
8
9

And the list after stream consumption? It is empty:

[]

So, all elements are consumed, and removed correctly. The sorted() operation is a “stateful intermediate operation”, which means that subsequent operations no longer operate on the backing collection, but on an internal state. It is now “safe” to remove elements from the list!

Well… can we really? Let’s proceed with parallel(), sorted() removal:

list.stream()
    .sorted()
    .parallel()
    .peek(list::remove)
    .forEach(System.out::println);

This now yields:

7
6
2
5
8
4
1
0
9
3

And the list contains

[8]

Eek. We didn’t remove all elements!? Free beers (and jOOQ stickers) go to anyone who solves this streams puzzler!

This all appears quite random and subtle, we can only suggest that you never actually do modify a backing collection while consuming a stream. It just doesn’t work.

9. Forgetting to actually consume the stream

What do you think the following stream does?

IntStream.range(1, 5)
         .peek(System.out::println)
         .peek(i -> { 
              if (i == 5) 
                  throw new RuntimeException("bang");
          });

When you read this, you might think that it will print (1 2 3 4 5) and then throw an exception. But that’s not correct. It won’t do anything. The stream just sits there, never having been consumed.

As with any fluent API or DSL, you might actually forget to call the “terminal” operation. This might be particularly true when you use peek(), as peek() is an aweful lot similar to forEach().

This can happen with jOOQ just the same, when you forget to call execute() or fetch():

DSL.using(configuration)
   .update(TABLE)
   .set(TABLE.COL1, 1)
   .set(TABLE.COL2, "abc")
   .where(TABLE.ID.eq(3));

Oops. No execute()

jOOQ, the best way to write SQL in Java

Yes, the “best” way – with 1-2 caveats ;-)

10. Parallel stream deadlock

This is now a real goodie for the end!

All concurrent systems can run into deadlocks, if you don’t properly synchronise things. While finding a real-world example isn’t obvious, finding a forced example is. The following parallel() stream is guaranteed to run into a deadlock:

Object[] locks = { new Object(), new Object() };

IntStream
    .range(1, 5)
    .parallel()
    .peek(Unchecked.intConsumer(i -> {
        synchronized (locks[i % locks.length]) {
            Thread.sleep(100);

            synchronized (locks[(i + 1) % locks.length]) {
                Thread.sleep(50);
            }
        }
    }))
    .forEach(System.out::println);

Note the use of Unchecked.intConsumer(), which transforms the functional IntConsumer interface into a org.jooq.lambda.fi.util.function.CheckedIntConsumer, which is allowed to throw checked exceptions.

Well. Tough luck for your machine. Those threads will be blocked forever :-)

The good news is, it has never been easier to produce a schoolbook example of a deadlock in Java!

For more details, see also Brian Goetz’s answer to this question on Stack Overflow.

Conclusion

With streams and functional thinking, we’ll run into a massive amount of new, subtle bugs. Few of these bugs can be prevented, except through practice and staying focused. You have to think about how to order your operations. You have to think about whether your streams may be infinite.

Streams (and lambdas) are a very powerful tool. But a tool which we need to get a hang of, first.

Stay tuned for more exciting Java 8 articles on this blog.

Stop Trying to Emulate SQL OFFSET Pagination with Your In-House DB Framework!


I’m pretty sure you’ve gotten it wrong in numerous ways, so far. And you probably won’t get it right any time soon. So why waste your precious time on SQL tweaking, when you could be implementing business logic?

Let me explain…

It hasn’t been until the recent SQL:2008 standard that what MySQL users know as LIMIT .. OFFSET was standardised into the following simple statement:

SELECT * 
FROM BOOK 
OFFSET 2 ROWS 
FETCH NEXT 1 ROWS ONLY

Yes. So many keywords.

SQL is indeed a very verbose language. Personally, we really like the conciseness of MySQL’s / PostgreSQL’s LIMIT .. OFFSET clause, which is why we chose that for the jOOQ DSL API

In SQL:

SELECT * FROM BOOK LIMIT 1 OFFSET 2

In jOOQ:

select().from(BOOK).limit(1).offset(2);

Now, when you’re a SQL framework vendor, or when you’re rolling your own, in-house SQL abstraction, you might think about standardising this neat little clause. Here’s a couple of flavours from databases that natively support offset pagination:

-- MySQL, H2, HSQLDB, Postgres, and SQLite
SELECT * FROM BOOK LIMIT 1 OFFSET 2

-- CUBRID supports a MySQL variant of the 
-- LIMIT .. OFFSET clause
SELECT * FROM BOOK LIMIT 2, 1

-- Derby, SQL Server 2012, Oracle 12, SQL:2008
SELECT * FROM BOOK 
OFFSET 2 ROWS FETCH NEXT 1 ROWS ONLY

-- Ingres. Eek, almost the standard. Almost!
SELECT * FROM BOOK 
OFFSET 2 FETCH FIRST 1 ROWS ONLY

-- Firebird
SELECT * FROM BOOK ROWS 2 TO 3

-- Sybase SQL Anywhere
SELECT TOP 1 ROWS START AT 3 * FROM BOOK

-- DB2 (without OFFSET)
SELECT * FROM BOOK FETCH FIRST 1 ROWS ONLY

-- Sybase ASE, SQL Server 2008 (without OFFSET)
SELECT TOP 1 * FROM BOOK

So far, so good. These can all be handled. Some databases put offsets before limits, others put limits before offsets, and the T-SQL family puts the whole TOP clause before the SELECT list. This is easy to emulate. Now what about:

  • Oracle 11g and less
  • SQL Server 2008 and less
  • DB2 with OFFSET

(note that you can enable various alternative syntaxes in DB2)

When you google for this, you will find millions of ways to emulate OFFSET .. FETCH in those older databases. The optimal solutions always involve:

  • Using doubly-nested derived tables with ROWNUM filtering in Oracle
  • Using single-nested derived tabels with ROW_NUMBER() filtering in SQL Server and DB2

So you’re emulating it.

Do you think you will get it right?

;-)

Let us go through a couple of issues that you may not have thought about.

First off, Oracle. Oracle obviously wanted to create a maximum vendor-lockin, which is only exceeded by Apple’s recent introduction of Swift. This is why ROWNUM solutions perform best, even better than SQL:2003 standard window function based solutions. Don’t believe it? Read this very interesting article on Oracle offset pagination performance.

So, the optimal solution in Oracle is:

-- PostgreSQL syntax:
SELECT ID, TITLE 
FROM BOOK 
LIMIT 1 OFFSET 2

-- Oracle equivalent:
SELECT *
FROM (
  SELECT b.*, ROWNUM rn
  FROM (
    SELECT ID, TITLE
    FROM BOOK
  ) b
  WHERE ROWNUM <= 3 -- (1 + 2)
)
WHERE rn > 2

So that’s really the equivalent?

Of course not. You’re selecting an additional column, the rn column. You might just not care in most cases, but what if you wanted to make a limited subquery to be used with an IN predicate?

-- PostgreSQL syntax:
SELECT *
FROM BOOK
WHERE AUTHOR_ID IN (
  SELECT ID
  FROM AUTHOR
  LIMIT 1 OFFSET 2
)

-- Oracle equivalent:
SELECT *
FROM BOOK
WHERE AUTHOR_ID IN (
  SELECT * -- Ouch. These are two columns!
  FROM (
    SELECT b.*, ROWNUM rn
    FROM (
      SELECT ID
      FROM AUTHOR
    ) b
    WHERE ROWNUM <= 3
  )
  WHERE rn > 2
)

So, as you can see, you’ll have to do some more sophisticated SQL transformation. If you’re manually emulating LIMIT .. OFFSET, then you might just patch the ID column into the subquery:

SELECT *
FROM BOOK
WHERE AUTHOR_ID IN (
  SELECT ID -- better
  FROM (
    SELECT b.ID, ROWNUM rn -- better
    FROM (
      SELECT ID
      FROM AUTHOR
    ) b
    WHERE ROWNUM <= 3
  )
  WHERE rn > 2
)

So, that’s more like it, right? But since you’re not writing this manually every time, you’re about to start creating your own nifty in-house SQL framework covering the 2-3 use cases that you’ve encountered so far, right?

You can do it. So you’ll regex-search-replace column names automagically to produce the above.

So now, it is correct?

Of course not! Because you can have ambiguous column names in top-level SELECTs, but not in nested selects. What if you want to do this:

-- PostgreSQL syntax:
-- Perfectly valid repetition of two ID columns
SELECT BOOK.ID, AUTHOR.ID
FROM BOOK
JOIN AUTHOR
ON BOOK.AUTHOR_ID = AUTHOR.ID
LIMIT 1 OFFSET 2

-- Oracle equivalent:
SELECT *
FROM (
  SELECT b.*, ROWNUM rn
  FROM (
    -- Ouch! ORA-00918: column ambiguously defined
    SELECT BOOK.ID, AUTHOR.ID
    FROM BOOK
    JOIN AUTHOR
    ON BOOK.AUTHOR_ID = AUTHOR.ID
  ) b
  WHERE ROWNUM <= 3
)
WHERE rn > 2

Nope. And the trick of manually patching ID columns from the previous example doesn’t work, because you have multiple ID instances. And renaming the columns to random values is nasty, because the user of your home-grown in-house database framework wants to receive well-defined column names. I.e. ID and… ID.

So, the solution is to rename the columns twice. Once in each derived table:

-- Oracle equivalent:
-- Rename synthetic column names back to original
SELECT c1 ID, c2 ID
FROM (
  SELECT b.c1, b.c2, ROWNUM rn
  FROM (
    -- synthetic column names here
    SELECT BOOK.ID c1, AUTHOR.ID c2
    FROM BOOK
    JOIN AUTHOR
    ON BOOK.AUTHOR_ID = AUTHOR.ID
  ) b
  WHERE ROWNUM <= 3
)
WHERE rn > 2

But now, we’re done?

Of course not! What if you doubly nest such a query? Will you think about doubly renaming ID columns to synthetic names, and back? … ;-) Let’s leave it here and talk about something entirely different:

Does the same thing work for SQL Server 2008?

Of course not! In SQL Server 2008, the most popular approach is to use window functions. Namely, ROW_NUMBER(). So, let’s consider:

-- PostgreSQL syntax:
SELECT ID, TITLE 
FROM BOOK 
LIMIT 1 OFFSET 2

-- SQL Server equivalent:
SELECT b.*
FROM (
  SELECT ID, TITLE, 
    ROW_NUMBER() OVER (ORDER BY ID) rn
  FROM BOOK
) b
WHERE rn > 2 AND rn <= 3

So that’s it, right?

Of course not! ;-)

OK, we’ve already had this issue. We should not select *, because that would generate too many columns in the case that we’re using this as a subquery for an IN predicate. So let’s consider the correct solution with synthetic column names:

-- SQL Server equivalent:
SELECT b.c1 ID, b.c2 TITLE
FROM (
  SELECT ID c1, TITLE c2,
    ROW_NUMBER() OVER (ORDER BY ID) rn
  FROM BOOK
) b
WHERE rn > 2 AND rn <= 3

But now we got it, right?

Make an educated guess: Nope!

What happens, if you add an ORDER BY clause to the original query?

-- PostgreSQL syntax:
SELECT ID, TITLE 
FROM BOOK 
ORDER BY SOME_COLUMN
LIMIT 1 OFFSET 2

-- Naive SQL Server equivalent:
SELECT b.c1 ID, b.c2 TITLE
FROM (
  SELECT ID c1, TITLE c2,
    ROW_NUMBER() OVER (ORDER BY ID) rn
  FROM BOOK
  ORDER BY SOME_COLUMN
) b
WHERE rn > 2 AND rn <= 3

Now, that doesn’t work in SQL Server. Subqueries are not allowed to have an ORDER BY clause, unless they also have a TOP clause (or an OFFSET .. FETCH clause in SQL Server 2012).

OK, we can probably tweak this using TOP 100 PERCENT to make SQL Server happy.

-- Better SQL Server equivalent:
SELECT b.c1 ID, b.c2 TITLE
FROM (
  SELECT TOP 100 PERCENT
    ID c1, TITLE c2,
    ROW_NUMBER() OVER (ORDER BY ID) rn
  FROM BOOK
  ORDER BY SOME_COLUMN
) b
WHERE rn > 2 AND rn <= 3

Now, that’s correct SQL according to SQL Server, although you do not have a guarantee that the ordering of the derived table will survive after query execution. It may well be that the ordering is changed again by some influence.

If you wanted to order by SOME_COLUMN in the outer query, you’d have to again transform the SQL statement to add another synthetic column:

-- Better SQL Server equivalent:
SELECT b.c1 ID, b.c2 TITLE
FROM (
  SELECT TOP 100 PERCENT
    ID c1, TITLE c2,
    SOME_COLUMN c99,
    ROW_NUMBER() OVER (ORDER BY ID) rn
  FROM BOOK
) b
WHERE rn > 2 AND rn <= 3
ORDER BY b.c99

That does start getting a bit nasty. And let’s guess whether:

This is the correct solution!

Of course not! What if the original query had DISTINCT in it?

-- PostgreSQL syntax:
SELECT DISTINCT AUTHOR_ID
FROM BOOK 
LIMIT 1 OFFSET 2

-- Naive SQL Server equivalent:
SELECT b.c1 AUTHOR_ID
FROM (
  SELECT DISTINCT AUTHOR_ID c1,
    ROW_NUMBER() OVER (ORDER BY AUTHOR_ID) rn
  FROM BOOK
) b
WHERE rn > 2 AND rn <= 3

Now, what happens if an author has written several books? Yes, the DISTINCT keyword should remove such duplicates, and effectively, the PostgreSQL query will correctly remove duplicates first, and then apply LIMIT and OFFSET.

However, the ROW_NUMBER() predicate always generates distinct row numbers before DISTINCT can remove them again. In other words, DISTINCT has no effect.

Luckily, we can tweak this SQL again, using this neat little trick:

-- Better SQL Server equivalent:
SELECT b.c1 AUTHOR_ID
FROM (
  SELECT DISTINCT AUTHOR_ID c1,
    DENSE_RANK() OVER (ORDER BY AUTHOR_ID) rn
  FROM BOOK
) b
WHERE rn > 2 AND rn <= 3

Read more about this trick here:

SQL Trick: row_number() is to SELECT what dense_rank() is to SELECT DISTINCT.

Watch out that the ORDER BY clause must contain all columns from the SELECT field list. Obviously, this will limit the acceptable columns in the SELECT DISTINCT field list to columns that are allowed in a window function’s ORDER BY clause (e.g. no other window functions).

We could of course try to fix that as well using common table expressions, or we consider

Yet another issue??

Yes, of course!

Do you even know what the column(s) in the window function’s ORDER BY clause should be? Have you just picked any column, at random? What if that column doesn’t have an index on it, will your window function still perform?

The answer is easy when your original SELECT statement also has an ORDER BY clause, then you should probably take that one (plus all the columns from the SELECT DISTINCT clause if applicable).

But what if you don’t have any ORDER BY clause?

Yet another trick! Use a “constant” variable:

-- Better SQL Server equivalent:
SELECT b.c1 AUTHOR_ID
FROM (
  SELECT AUTHOR_ID c1,
    ROW_NUMBER() OVER (ORDER BY @@version) rn
  FROM BOOK
) b
WHERE rn > 2 AND rn <= 3

Yes, you need to use a variable, because constants are not allowed in those ORDER BY clauses, in SQL Server. Painful, I know.

Read more about this @@version trick here.

Are we done yet!?!?

Probably not ;-) But we have probably covered around 99% of the common and edge cases. We can sleep nicely, now.

Note that all of these SQL transformations are implemented in jOOQ. jOOQ is the only SQL abstraction framework that takes SQL seriously (with all its warts and caveats), standardising over all of this madness.

As mentioned in the beginning, with jOOQ, you just write:

// Don't worry about general emulation
select().from(BOOK).limit(1).offset(2);

// Don't worry about duplicate column names
// in subselects
select(BOOK.ID, AUTHOR.ID)
.from(BOOK)
.join(AUTHOR)
.on(BOOK.AUTHOR_ID.eq(AUTHOR.ID))
.limit(1).offset(2);

// Don't worry about invalid IN predicates
select()
.from(BOOK)
.where(BOOK.AUTHOR_ID).in(
    select(AUTHOR.ID)
    .from(AUTHOR)
    .limit(1).offset(2)
);

// Don't worry about the ROW_NUMBER() vs.
// DENSE_RANK() distinction
selectDistinct(AUTHOR_ID)
    .from(BOOK).limit(1).offset(2);

With jOOQ, you can just write your Oracle SQL or Transact SQL as if it were as awesome as PostgreSQL! … without jumping the SQL ship entirely, and moving on to JPA.

jOOQ, the best way to write SQL in Java

Keyset paging

Now, of course, if you have been reading our blog, or our partner blog SQL Performance Explained, you should know by now that OFFSET pagination is often a bad choice in the first place. You should know that keyset pagination almost always outperforms OFFSET pagination.

Read about how jOOQ natively supports keyset pagination using the SEEK clause, here.

Follow

Get every new post delivered to your Inbox.

Join 1,990 other followers

%d bloggers like this: