When the Java 8 Streams API is not Enough

Java 8 was – as always – a release of compromises and backwards-compatibility. A release where the JSR-335 expert group might not have agreed upon scope or feasibility of certain features with some of the audience. See some concrete explanations by Brian Goetz about why …

But today we’re going to focus on the Streams API’s “short-comings”, or as Brian Goetz would probably put it: things out of scope given the design goals.

Parallel Streams?

Parallel computing is hard, and it used to be a pain. People didn’t exactly love the new (now old) Fork / Join API, when it was first shipped with Java 7. Conversely, and clearly, the conciseness of calling Stream.parallel() is unbeatable.

But many people don’t actually need parallel computing (not to be confused with multi-threading!). In 95% of all cases, people would have probably preferred a more powerful Streams API, or perhaps a generally more powerful Collections API with lots of awesome methods on various Iterable subtypes.

Changing Iterable is dangerous, though. Even a no-brainer as transforming an Iterable into a Stream via a potential Iterable.stream() method seems to risk opening pandora’s box!.

Sequential Streams!

So if the JDK doesn’t ship it, we create it ourselves!

Streams are quite awesome per se. They’re potentially infinite, and that’s a cool feature. Mostly – and especially with functional programming – the size of a collection doesn’t really matter that much, as we transform element by element using functions.

If we admit Streams to be purely sequential, then we could have any of these pretty cool methods as well (some of which would also be possible with parallel Streams):

  • cycle() – a guaranteed way to make every stream infinite
  • duplicate() – duplicate a stream into two equivalent streams
  • foldLeft() – a sequential and non-associative alternative to reduce()
  • foldRight() – a sequential and non-associative alternative to reduce()
  • limitUntil() – limit the stream to those records before the first one to satisfy a predicate
  • limitWhile() – limit the stream to those records before the first one not to satisfy a predicate
  • maxBy() – reduce the stream to the maximum mapped value
  • minBy() – reduce the stream to the minimum mapped value
  • partition() – partition a stream into two streams, one satisfying a predicate and the other not satisfying the same predicate
  • reverse() – produce a new stream in inverse order
  • skipUntil() – skip records until a predicate is satisified
  • skipWhile() – skip records as long as a predicate is satisfied
  • slice() – take a slice of the stream, i.e. combine skip() and limit()
  • splitAt() – split a stream into two streams at a given position
  • unzip() – split a stream of pairs into two streams
  • zip() – merge two streams into a single stream of pairs
  • zipWithIndex() – merge a stream with its corresponding stream of indexes into a single stream of pairs

jOOλ’s new Seq type does all that

All of the above is part of jOOλ. jOOλ (pronounced “jewel”, or “dju-lambda”, also written jOOL in URLs and such) is an ASL 2.0 licensed library that emerged from our own development needs when implementing jOOQ integration tests with Java 8. Java 8 is exceptionally well-suited for writing tests that reason about sets, tuples, records, and all things SQL.

But the Streams API just slightly feels insufficient, so we have wrapped JDK’s Streams into our own Seq type (Seq for sequence / sequential Stream):

// Wrap a stream in a sequence
Seq<Integer> seq1 = seq(Stream.of(1, 2, 3));

// Or create a sequence directly from values
Seq<Integer> seq2 = Seq.of(1, 2, 3);

We’ve made Seq a new interface that extends the JDK Stream interface, so you can use Seq fully interoperably with other Java APIs – leaving the existing methods unchanged:

public interface Seq<T> extends Stream<T> {

    /**
     * The underlying {@link Stream} implementation.
     */
    Stream<T> stream();
	
	// [...]
}

Now, functional programming is only half the fun if you don’t have tuples. Unfortunately, Java doesn’t have built-in tuples and while it is easy to create a tuple library using generics, tuples are still second-class syntactic citizens when comparing Java to Scala, for instance, or C# and even VB.NET.

Nonetheless…

jOOλ also has tuples

We’ve run a code-generator to produce tuples of degree 1-8 (we might add more in the future, e.g. to match Scala’s and jOOQ’s “magical” degree 22).

And if a library has such tuples, the library also needs corresponding functions. The essence of these TupleN and FunctionN types is summarised as follows:

public class Tuple3<T1, T2, T3>
implements 
    Tuple, 
	Comparable<Tuple3<T1, T2, T3>>, 
	Serializable, Cloneable {
    
    public final T1 v1;
    public final T2 v2;
    public final T3 v3;
	
	// [...]
}

and

@FunctionalInterface
public interface Function3<T1, T2, T3, R> {

    default R apply(Tuple3<T1, T2, T3> args) {
        return apply(args.v1, args.v2, args.v3);
    }

    R apply(T1 v1, T2 v2, T3 v3);
}

There are many more features in Tuple types, but let’s leave them out for today.

On a side note, I’ve recently had an interesting discussion with Gavin King (the creator of Hibernate) on reddit. From an ORM perspective, Java classes seem like a suitable implementation for SQL / relational tuples, and they are indeed. From an ORM perspective.

But classes and tuples are fundamentally different, which is a very subtle issue with most ORMs – e.g. as explained here by Vlad Mihalcea.

Besides, SQL’s notion of row value expressions (i.e. tuples) is quite different from what can be modelled with Java classes. This topic will be covered in a subsequent blog post.

Some jOOλ examples

With the aforementioned goals in mind, let’s see how the above API can be put to work by example:

zipping

// (tuple(1, "a"), tuple(2, "b"), tuple(3, "c"))
Seq.of(1, 2, 3).zip(Seq.of("a", "b", "c"));

// ("1:a", "2:b", "3:c")
Seq.of(1, 2, 3).zip(
    Seq.of("a", "b", "c"), 
    (x, y) -> x + ":" + y
);

// (tuple("a", 0), tuple("b", 1), tuple("c", 2))
Seq.of("a", "b", "c").zipWithIndex();

// tuple((1, 2, 3), (a, b, c))
Seq.unzip(Seq.of(
    tuple(1, "a"),
    tuple(2, "b"),
    tuple(3, "c")
));

This is already a case where tuples have become very handy. When we “zip” two streams into one, we want a wrapper value type that combines both values. Classically, people might’ve used Object[] for quick-and-dirty solutions, but an array doesn’t indicate attribute types or degree.

Unfortunately, the Java compiler cannot reason about the effective bound of the <T> type in Seq<T>. This is why we can only have a static unzip() method (instead of an instance one), whose signature looks like this:

// This works
static <T1, T2> Tuple2<Seq<T1>, Seq<T2>> 
    unzip(Stream<Tuple2<T1, T2>> stream) { ... }
	
// This doesn't work:
interface Seq<T> extends Stream<T> {
    Tuple2<Seq<???>, Seq<???>> unzip();
}

Skipping and limiting

// (3, 4, 5)
Seq.of(1, 2, 3, 4, 5).skipWhile(i -> i < 3);

// (3, 4, 5)
Seq.of(1, 2, 3, 4, 5).skipUntil(i -> i == 3);

// (1, 2)
Seq.of(1, 2, 3, 4, 5).limitWhile(i -> i < 3);

// (1, 2)
Seq.of(1, 2, 3, 4, 5).limitUntil(i -> i == 3);

Other functional libraries probably use different terms than skip (e.g. drop) and limit (e.g. take). It doesn’t really matter in the end. We opted for the terms that are already present in the existing Stream API: Stream.skip() and Stream.limit()

Folding

// "abc"
Seq.of("a", "b", "c").foldLeft("", (u, t) -> t + u);

// "cba"
Seq.of("a", "b", "c").foldRight("", (t, u) -> t + u);

The Stream.reduce() operations are designed for parallelisation. This means that the functions passed to it must have these important attributes:

But sometimes, you really want to “reduce” a stream with functions that do not have the above attributes, and consequently, you probably don’t care about the reduction being parallelisable. This is where “folding” comes in.

A nice explanation about the various differences between reducing and folding (in Scala) can be seen here.

Splitting

// tuple((1, 2, 3), (1, 2, 3))
Seq.of(1, 2, 3).duplicate();

// tuple((1, 3, 5), (2, 4, 6))
Seq.of(1, 2, 3, 4, 5, 6).partition(i -> i % 2 != 0)

// tuple((1, 2), (3, 4, 5))
Seq.of(1, 2, 3, 4, 5).splitAt(2);

The above functions all have one thing in common: They operate on a single stream in order to produce two new streams, that can be consumed independently.

Obviously, this means that internally, some memory must be consumed to keep buffers of partially consumed streams. E.g.

  • duplication needs to keep track of all values that have been consumed in one stream, but not in the other
  • partitioning needs to fast forward to the next value that satisfies (or doesn’t satisfy) the predicate, without losing all the dropped values
  • splitting might need to fast forward to the split index

For some real functional fun, let’s have a look at a possible splitAt() implementation:

static <T> Tuple2<Seq<T>, Seq<T>> 
splitAt(Stream<T> stream, long position) {
    return seq(stream)
          .zipWithIndex()
          .partition(t -> t.v2 < position)
          .map((v1, v2) -> tuple(
              v1.map(t -> t.v1),
              v2.map(t -> t.v1)
          ));
}

… or with comments:

static <T> Tuple2<Seq<T>, Seq<T>> 
splitAt(Stream<T> stream, long position) {
    // Add jOOλ functionality to the stream
    // -> local Type: Seq<T>
    return seq(stream)
	
    // Keep track of stream positions
    // with each element in the stream
    // -> local Type: Seq<Tuple2<T, Long>>
          .zipWithIndex()
	  
    // Split the streams at position
    // -> local Type: Tuple2<Seq<Tuple2<T, Long>>,
    //                       Seq<Tuple2<T, Long>>>
          .partition(t -> t.v2 < position)
		  
    // Remove the indexes from zipWithIndex again
    // -> local Type: Tuple2<Seq<T>, Seq<T>>
          .map((v1, v2) -> tuple(
              v1.map(t -> t.v1),
              v2.map(t -> t.v1)
          ));
}

Nice, isn’t it? A possible implementation for partition(), on the other hand, is a bit more complex. Here trivially with Iterator instead of the new Spliterator:

static <T> Tuple2<Seq<T>, Seq<T>> partition(
        Stream<T> stream, 
        Predicate<? super T> predicate
) {
    final Iterator<T> it = stream.iterator();
    final LinkedList<T> buffer1 = new LinkedList<>();
    final LinkedList<T> buffer2 = new LinkedList<>();

    class Partition implements Iterator<T> {

        final boolean b;

        Partition(boolean b) {
            this.b = b;
        }

        void fetch() {
            while (buffer(b).isEmpty() && it.hasNext()) {
                T next = it.next();
                buffer(predicate.test(next)).offer(next);
            }
        }

        LinkedList<T> buffer(boolean test) {
            return test ? buffer1 : buffer2;
        }

        @Override
        public boolean hasNext() {
            fetch();
            return !buffer(b).isEmpty();
        }

        @Override
        public T next() {
            return buffer(b).poll();
        }
    }

    return tuple(
        seq(new Partition(true)), 
        seq(new Partition(false))
    );
}

I’ll let you do the exercise and verify the above code.

Get and contribute to jOOλ, now!

All of the above is part of jOOλ, available for free from GitHub. There is already a partially Java-8-ready, full-blown library called functionaljava, which goes much further than jOOλ.

Yet, we believe that all what’s missing from Java 8’s Streams API is really just a couple of methods that are very useful for sequential streams.

In a previous post, we’ve shown how we can bring lambdas to String-based SQL using a simple wrapper for JDBC (of course, we still believe that you should use jOOQ instead).

Today, we’ve shown how we can write awesome functional and sequential Stream processing very easily, with jOOλ.

Stay tuned for even more jOOλ goodness in the near future (and pull requests are very welcome, of course!)

Top 10 Very Very VERY Important Topics to Discuss

Some things are just very very very VERY very important. Such as John Cleese.

The same is true for Whitespace:

Yes. 1080 Reddit Karma points (so urgently needed!) in only 23 hours. That’s several orders of magnitudes better than any of our – what we wrongfully thought to be – very deep and interesting technical insight about Java and SQL has ever produced.

The topic of interest was a humourous treatise about whether this:

for (int i=0; i<LENGTH; i++)

… or this:

for (int i = 0; i < LENGTH; i++)

… should be preferred. Obviously both options are completely wrong. The right answer is:

for 
(   int i = 0
;   i < LENGTH
;   i++
)

Read the full treatise here.

But at some point, the whitespace discussion is getting stale. We need new very very very important topics to discuss instead of fixing them bugs. After all, the weekend is imminent, and we don’t know what else to talk about.

This is why we are now publishing…

Top 10 Very Very VERY Important Topics to Discuss

Here we go…

0. Whitespace

OK, that was a no-brainer. We’ve already had that. Want to participate? The very interesting Reddit discussion is still hot.

1. The Vietnam of Computer Science

In case you haven’t heard of this highly interesting discussion, there are some people who believe that ORMs are outdated, because ORMs don’t work as promised. And they’re totally right. And the best thing is, all the others are totally right as well.

Why is that great? Because that means we get to discuss it. Endlessly!

While everyone keeps talking about ORMs like that, no one cares what Gavin King (creator of Hibernate) had said from the beginning:

Why should we care about his opinion? We have our own, far superior opinion! Let’s have another discussion about why ORMs are evil!

2. Case-sensitivity

Unfortunately, us Java folks cannot have any of those very very very very very important discussions about casing, because unfortunately, Java is a case-sensitive language.

But take SQL (or PL/SQL, T-SQL for that sake). When writing SQL, we can have awesome discussions about whether we should

-- Upper case it all
SELECT TAB.COL1, TAB.COL2 FROM TAB

-- Upper case keywords, lower case identifiers
SELECT tab.col1, tab.col2 FROM tab

-- Lower case keywords, upper case identifiers
select TAB.COL1, TAB.COL2 from TAB

-- Lower case it all
select tab.col1, tab.col2 from tab

-- Add some PascalCase (awesome SQL Server!)
SELECT Tab.Col1, Tab.Col2 FROM Tab

-- Mix case-sensitivity with case-insensitivity
-- (Protip to piss off your coworkers: Name your
-- table "FROM" or "INTO" and let them figure out
-- how to query that table)
SELECT TAB."COL1", TAB."col2" FROM "FROM"

-- PascalCase keywords (wow, MS Access)
Select TAB.COL1, TAB.COL2 From TAB

Now that is really incredibly interesting. And because this is so interesting and important, you can only imagine the number of interesting discussions we’ve had on the jOOQ User Group, for instance, about how to best generate meta data from the database. With jOOQ, we promise that you can extend these enticing discussions from the SQL area to the Java area by overriding the code generator’s default behaviour:

  • Should classes be PascalCased and literals be UPPER_CASED?
  • Should everything be PascalCased and camelCased as in Java?
  • Should everything be generated as named in the database?

Endless interesting discussions!

jOOQ: The Best Way to Write SQL in Java

We have so many options to SQL casing, which brings us to

3. SQL formatting

Unlike C-style general-purpose languages such as C, Java, Scala, C#, or even keyword-heavy ones Delphi, Pascal, Ada, SQL has one more awesome grounds for numerous discussions. It is not only keyword-heavy, but it also has a very complex and highly irregular syntax. So we’re lucky enough to get to choose (after long discussions and settlements) between:

-- All on one line. Don't tell me about print margins,
-- Or I'll telefax you my SQL!
SELECT table1.col1, table1.col2 FROM table1 JOIN table2 ON table1.id = table2.id WHERE id IN (SELECT x FROM other_table)

-- "Main" keywords on new line
SELECT table1.col1, table1.col2 
FROM table1 JOIN table2 ON table1.id = table2.id 
WHERE id IN (SELECT x FROM other_table)

-- (almost) all keywords on new line
SELECT table1.col1, table1.col2 
FROM table1 
JOIN table2 
ON table1.id = table2.id 
WHERE id IN (SELECT x FROM other_table)

-- "Main" keywords on new line, others indented
SELECT table1.col1, table1.col2 
FROM table1 
  JOIN table2 
  ON table1.id = table2.id 
WHERE id IN (
  SELECT x 
  FROM other_table
)

-- "Main" keywords on new line, others heavily indented
SELECT table1.col1, table1.col2 
FROM table1 JOIN table2 
              ON table1.id = table2.id 
WHERE id IN (SELECT x 
             FROM other_table)

-- Doge formatting
SUCH table1.col1,
                 table1.col2
    MUCH table1
JOIN table2 WOW table1.id
            = table2.id
WHICH              id IN
   (SUCH x

WOW other_table
            )
Doge SQL Formatting

Doge SQL Formatting

And so on and so forth. Now any project manager should reserve at least 10 man-weeks in every project to agree on rules about SQL formatting.

4. The end of the DBA

Now THAT is a very interesting topic that is not only interesting for developers who are so knowledgeable about productive systems, no it’s also very interesting for operations teams. Because as we all know, the DBA is dead (again).

For those of you who have been missing out on this highly interesting topic, do know that all of this started (again) when the great NoSQL vs. SQL debate was initiated by brilliant minds and vendors of truly alternative systems. Which are now starting to implement SQL, because apparently, well… SQL isn’t all that bad:

Please, do engage in some more discussions about the best and only true way to tackle database problems. Because your opinion counts!

5. New lines and comments

Remember our own blog post about putting some keywords on new lines? Yes, we prefer:

// If this
if (something) {
    ...
}

// Else something else
else {
    ...
}

Exactly. Because this allows comments to be written where they belong: Next to the appropriate keyword, and always aligned at the same column. This leads us to the next very interesting question: Should we put comments in code at all? Or is clean code self-documenting?

And we say, why yes, of course we should comment. How on earth will anyone ever remember the rationale behind something like this??

// [#2744] DB2 knows the SELECT .. FROM FINAL 
// TABLE (INSERT ..) syntax
case DB2:

// Firebird and Postgres can execute the INSERT 
// .. RETURNING clause like a select clause. JDBC
// support is not implemented in the Postgres JDBC
// driver
case FIREBIRD:
case POSTGRES: {
    try {
        listener.executeStart(ctx);
        rs = ctx.statement().executeQuery();
        listener.executeEnd(ctx);
    }
    finally {
        consumeWarnings(ctx, listener);
    }

    break;
}

Taken from our “hacking JDBC” page.

6. JSON is totally better than XML

Of course it is! Because… because… errr. Because it allows me to structure data hierarchically. Waaaait a second…

Dayum.

You’re saying, JSON and XML are the SAME THING!?

But MongoDB and PostgreSQL allow me to store JSON. Oh wait. They tried to store XML in databases, back in the 90s, too!? And it failed? Well, of course it failed, because XML sucks, right? (which is essentially another way of saying that I’ve never understood XSLT or XQuery or XPath or didn’t even hear about XProc, and I’m just ranting about angle brackets and namespaces)

Let’s further discuss this matter. I feel that we’re close to the very ultimate solution on that topic.

Speaking of JSON…

7. Curly braces

OMG! This is the most interesting of all topics. Should we put the opening brace:

  • On the same line?
  • On a NEW line??
  • NO BRACE AT ALL???

The right answers are 1) and 3). 1) only if we absolutely have to, as in try or switch statements. We’re not paid by the number of lines of code, so we don’t add meaningless lines with only opening braces. And if we can omit the braces entirely, fine. Here’s an awesome statement, if you ask me:

if (something)
    outer:
    for (String thing : list)
        inner:
        do
            if (somethingElse)
                break inner;
            else
                continue outer;
        while (true);

That ought to teach them juniors not to touch my code. Which brings us to:

8. Labels

Nothing wrong with them. I’ll break out of my loops any time I want. Don’t tell me labels are Java’s way of saying GOTO, they’re much more sophisticated than that. (Besides, goto is a reserved word in Java, and it is an actual bytecode instruction). So I’ll happily do my jumping forward:

label: {
  // do stuff
  if (check) break label;
  // do more stuff
}

Or my jumping backward:

label: do {
  // do stuff
  if (check) continue label;
  // do more stuff
  break label;
} while(true);

(observe how the above example used two spaces for indentation instead of four (or tabs). Another great topic for further discussion)

9. emacs vs. vim vs. vi vs. Eclipse vs. IntelliJ vs. Netbeans

Can we please, PLEASE, have another very interesting discussion about which one of these is better? Please!

10. Last but not Least: Is Haskell better than [your language]?

According to TIOBE, Haskell ranks 38.

And as we all know, the actual market share (absolutely none in the case of Haskell) of any programming language is inversely proportional to the amount of time spent on reddit discussing the importance of said language, and how said language is totally superior to the one ranking 1-2 above on TIOBE, for instance. Which would be Lua.

So, I would love to invite you to join our blogging friends below to a very very interesting discussion about…

Now, of course, we could enlargen the debate and compare functional programming with OO programming in general before delving into why Scala is NOT a functional programming language, let alone Java 8.

Oh, and you think your dialect of Haskell or Lisp is not good enough, so you should roll your own? Go ahead (right after checking this checklist!)

Such great topics. So little time.

Conclusion

The great thing about these social networks like Reddit, Hackernews, and all the others is the fact that we can finally spend all day to discuss really really intersting topics instead of fixing them boring bugs our boss wants us to fix. After all, this is IMPORTANT.

Or as Randall Munroe would say: “Duty calls!”

Further reading

If you’re now all hot and ready to discuss things, please consider also reading these very interesting and insightful articles on how to best format and style code:

Or add your own. There’s still much much important writing to do!

Java 8 Friday: More Functional Relational Transformation

In the past, we’ve been providing you with a new article every Friday about what’s new in Java 8. It has been a very exciting blog series, but we would like to focus again more on our core content, which is Java and SQL. We will still be occasionally blogging about Java 8, but no longer every Friday (as some of you have already notice).

In this last, short post of the Java 8 Friday series, we’d like to re-iterate the fact that we believe that the future belongs to functional relational data transformation (as opposed to ORM). We’ve spent about 20 years now using the object-oriented software development paradigm. Many of us have been very dogmatic about it. In the last 10 years, however, a “new” paradigm has started to get increasing traction in programming communities: Functional programming.

Functional programming is not that new, however. Lisp has been a very early functional programming language. XSLT and SQL are also somewhat functional (and declarative!). As we’re big fans of SQL’s functional (and declarative!) nature, we’re quite excited about the fact that we now have sophisticated tools in Java to transform tabular data that has been extracted from SQL databases. Streams!

SQL ResultSets are very similar to Streams

As we’ve pointed out before, JDBC ResultSets and Java 8 Streams are quite similar. This is even more true when you’re using jOOQ, which replaces the JDBC ResultSet by an org.jooq.Result, which extends java.util.List, and thus automatically inherits all Streams functionality. Consider the following query that allows fetching a one-to-many relationship between BOOK and AUTHOR records:

Map<Record2<String, String>, 
    List<Record2<Integer, String>>> booksByAuthor =

// This work is performed in the database
// --------------------------------------
ctx.select(
        BOOK.ID,
        BOOK.TITLE,
        AUTHOR.FIRST_NAME,
        AUTHOR.LAST_NAME
    )
   .from(BOOK)
   .join(AUTHOR)
   .on(BOOK.AUTHOR_ID.eq(AUTHOR.ID))
   .orderBy(BOOK.ID)
   .fetch()

// This work is performed in Java memory
// -------------------------------------
   .stream()

   // Group BOOKs by AUTHOR
   .collect(groupingBy(

        // This is the grouping key      
        r -> r.into(AUTHOR.FIRST_NAME, 
                    AUTHOR.LAST_NAME),

        // This is the target data structure
        LinkedHashMap::new,

        // This is the value to be produced for each
        // group: A list of BOOK
        mapping(
            r -> r.into(BOOK.ID, BOOK.TITLE),
            toList()
        )
    ));

The fluency of the Java 8 Streams API is very idiomatic to someone who has been used to writing SQL with jOOQ. Obviously, you can also use something other than jOOQ, e.g. Spring’s JdbcTemplate, or Apache Commons DbUtils, or just wrap the JDBC ResultSet in an Iterator…

What’s very nice about this approach, compared to ORM is the fact that there is no magic happening at all. Every piece of mapping logic is explicit and, thanks to Java generics, fully typesafe. The type of the booksByAuthor output is complex, and a bit hard to read / write, in this example, but it is also fully descriptive and useful.

The same functional transformation with POJOs

If you aren’t too happy with using jOOQ’s Record2 tuple types, no problem. You can specify your own data transfer objects like so:

class Book {
    public int id;
    public String title;

    @Override
    public String toString() { ... }

    @Override
    public int hashCode() { ... }

    @Override
    public boolean equals(Object obj) { ... }
}

static class Author {
    public String firstName;
    public String lastName;

    @Override
    public String toString() { ... }

    @Override
    public int hashCode() { ... }

    @Override
    public boolean equals(Object obj) { ... }
}

With the above DTO, you can now leverage jOOQ’s built-in POJO mapping to transform the jOOQ records into your own domain classes:

Map<Author, List<Book>> booksByAuthor =
ctx.select(
        BOOK.ID,
        BOOK.TITLE,
        AUTHOR.FIRST_NAME,
        AUTHOR.LAST_NAME
    )
   .from(BOOK)
   .join(AUTHOR)
   .on(BOOK.AUTHOR_ID.eq(AUTHOR.ID))
   .orderBy(BOOK.ID)
   .fetch()
   .stream()
   .collect(groupingBy(

        // This is the grouping key      
        r -> r.into(Author.class),
        LinkedHashMap::new,

        // This is the grouping value list
        mapping(
            r -> r.into(Book.class),
            toList()
        )
    ));

Explicitness vs. implicitness

At Data Geekery, we believe that a new time has started for Java developers. A time where Annotatiomania™ (finally!) ends and people stop assuming all that implicit behaviour through annotation magic. ORMs depend on a huge amount of specification to explain how each annotation works with each other annotation. It is hard to reverse-engineer (or debug!) this kind of not-so-well-understood annotation-language that JPA has brought to us.

On the flip side, SQL is pretty well understood. Tables are an easy-to-handle data structure, and if you need to transform those tables into something more object-oriented, or more hierarchically structured, you can simply apply functions to those tables and group values yourself! By grouping those values explicitly, you stay in full control of your mapping, just as with jOOQ, you stay in full control of your SQL.

This is why we believe that in the next 5 years, ORMs will lose relevance and people start embracing explicit, stateless and magicless data transformation techniques again, using Java 8 Streams.

Java 8 Friday: No More Need for ORMs

At Data Geekery, we love Java. And as we’re really into jOOQ’s fluent API and query DSL, we’re absolutely thrilled about what Java 8 will bring to our ecosystem.

Java 8 Friday

Every Friday, we’re showing you a couple of nice new tutorial-style Java 8 features, which take advantage of lambda expressions, extension methods, and other great stuff. You’ll find the source code on GitHub.

No More Need for ORMs

Debates about the usefulness of ORM (Object-Relational Mapping) have been going on for the last decade. While many people would agree that Hibernate and JPA solve a lot of problems very well (mostly the persistence of complex object graphs), others may claim that the mapping complexity is mostly overkill for data-centric applications.

JPA solves mapping problems by establishing standardised, declarative mapping rules through hard-wired annotations on the receiving target types. We claim that many data-centric problems should not be limited by the narrow scope of these annotations, but be solved in a much more functional way. Java 8, and the new Streams API finally allow us to do this in a very concise manner!

Let’s start with a simple example, where we’re using H2’s INFORMATION_SCHEMA to collect all tables and their columns. We’ll want to produce an ad-hoc data structure of the type Map<String, List<String>> to contain this information. For simplicity of SQL interaction, we’ll use jOOQ (as always, a shocker on this blog). Here’s how we prepare this:

public static void main(String[] args)
throws Exception {
    Class.forName("org.h2.Driver");
    try (Connection c = getConnection(
            "jdbc:h2:~/sql-goodies-with-mapping", 
            "sa", "")) {

        // This SQL statement produces all table
        // names and column names in the H2 schema
        String sql =
            "select table_name, column_name " +
            "from information_schema.columns " +
            "order by " +
                "table_catalog, " +
                "table_schema, " +
                "table_name, " +
                "ordinal_position";

        // This is jOOQ's way of executing the above
        // statement. Result implements List, which
        // makes subsequent steps much easier
        Result<Record> result =
        DSL.using(c)
           .fetch(sql)
    }
}

Now that we’ve set up this query, let’s see how we can produce the Map<String, List<String>> from the jOOQ Result:

DSL.using(c)
   .fetch(sql)
   .stream()
   .collect(groupingBy(
       r -> r.getValue("TABLE_NAME"),
       mapping(
           r -> r.getValue("COLUMN_NAME"),
           toList()
       )
   ))
   .forEach(
       (table, columns) -> 
           System.out.println(table + ": " + columns)
   );

The above example produces the following output:

FUNCTION_COLUMNS: [ALIAS_CATALOG, ALIAS_SCHEMA, ...]
CONSTANTS: [CONSTANT_CATALOG, CONSTANT_SCHEMA, ...]
SEQUENCES: [SEQUENCE_CATALOG, SEQUENCE_SCHEMA, ...]

How does it work? Let’s go through it step-by-step

DSL.using(c)
   .fetch(sql)

// Here, we transform a List into a Stream
   .stream()

// We're collecting Stream elements into a new
// collection type
   .collect(

// The Collector is a grouping operation, producing
// a Map
            groupingBy(

// The grouping operation's group key is defined by
// the jOOQ Record's TABLE_NAME value
       r -> r.getValue("TABLE_NAME"),

// The grouping operation's group value is generated
// by this mapping expression...
       mapping(

// ... which is essentially mapping each grouped
// jOOQ Record to the Record's COLUMN_NAME value
           r -> r.getValue("COLUMN_NAME"),

// ... and then collecting all those values into a
// java.util.List. Whew
           toList()
       )
   ))

// Once we have this Map<String, List<String>> we can
// simply consume its entries with the following Consumer
// lambda expression
   .forEach(
       (table, columns) -> 
           System.out.println(table + ": " + columns)
   );

Got it? These things are certainly a bit tricky when playing around with it for the first time. The combination of new types, extensive generics, lambda expressions can be a bit confusing at first. The best thing is to simply practice with these things until you get a hang of it. After all, the whole Streams API is really a revolution compared to previous Java Collections APIs.

The good news is: This API is final and here to stay. Every minute you spend practicing it is an investment into your own future.

Note that the above programme used the following static import:

import static java.util.stream.Collectors.*;

Note also, that the output was no longer ordered as in the database. This is because the groupingBy collector returns a java.util.HashMap. In our case, we might prefer collecting things into a java.util.LinkedHashMap, which preserves insertion / collection order:

DSL.using(c)
   .fetch(sql)
   .stream()
   .collect(groupingBy(
       r -> r.getValue("TABLE_NAME"),

       // Add this Supplier to the groupingBy
       // method call
       LinkedHashMap::new,
       mapping(
           r -> r.getValue("COLUMN_NAME"),
           toList()
       )
   ))
   .forEach(...);

We could go on with other means of transforming results. Let’s imagine, we would like to generate simplistic DDL from the above schema. It’s very simple. First, we’ll need to select column’s data type. We’ll simply add it to our SQL query:

String sql =
    "select " +
        "table_name, " +
        "column_name, " +
        "type_name " + // Add the column type
    "from information_schema.columns " +
    "order by " +
        "table_catalog, " +
        "table_schema, " +
        "table_name, " +
        "ordinal_position";

I have also introduced a new local class for the example, to wrap name and type attributes:

class Column {
    final String name;
    final String type;

    Column(String name, String type) {
        this.name = name;
        this.type = type;
    }
}

Now, let’s see how we’ll change our Streams API method calls:

result
    .stream()
    .collect(groupingBy(
        r -> r.getValue("TABLE_NAME"),
        LinkedHashMap::new,
        mapping(

            // We now collect this new wrapper type
            // instead of just the COLUMN_NAME
            r -> new Column(
                r.getValue("COLUMN_NAME", String.class),
                r.getValue("TYPE_NAME", String.class)
            ),
            toList()
        )
    ))
    .forEach(
        (table, columns) -> {

            // Just emit a CREATE TABLE statement
            System.out.println(
                "CREATE TABLE " + table + " (");

            // Map each "Column" type into a String
            // containing the column specification,
            // and join them using comma and
            // newline. Done!
            System.out.println(
                columns.stream()
                       .map(col -> "  " + col.name +
                                    " " + col.type)
                       .collect(Collectors.joining(",\n"))
            );

            System.out.println(");");
        }
    );

The output couldn’t be more awesome!

CREATE TABLE CATALOGS(
  CATALOG_NAME VARCHAR
);
CREATE TABLE COLLATIONS(
  NAME VARCHAR,
  KEY VARCHAR
);
CREATE TABLE COLUMNS(
  TABLE_CATALOG VARCHAR,
  TABLE_SCHEMA VARCHAR,
  TABLE_NAME VARCHAR,
  COLUMN_NAME VARCHAR,
  ORDINAL_POSITION INTEGER,
  COLUMN_DEFAULT VARCHAR,
  IS_NULLABLE VARCHAR,
  DATA_TYPE INTEGER,
  CHARACTER_MAXIMUM_LENGTH INTEGER,
  CHARACTER_OCTET_LENGTH INTEGER,
  NUMERIC_PRECISION INTEGER,
  NUMERIC_PRECISION_RADIX INTEGER,
  NUMERIC_SCALE INTEGER,
  CHARACTER_SET_NAME VARCHAR,
  COLLATION_NAME VARCHAR,
  TYPE_NAME VARCHAR,
  NULLABLE INTEGER,
  IS_COMPUTED BOOLEAN,
  SELECTIVITY INTEGER,
  CHECK_CONSTRAINT VARCHAR,
  SEQUENCE_NAME VARCHAR,
  REMARKS VARCHAR,
  SOURCE_DATA_TYPE SMALLINT
);

Excited? The ORM era may have ended just now

This is a strong statement. The ORM era may have ended. Why? Because using functional expressions to transform data sets is one of the most powerful concepts in software engineering. Functional programming is very expressive and very versatile. It is at the core of data and data streams processing. We Java developers already know existing functional languages. Everyone has used SQL before, for instance. Think about it. With SQL, you declare table sources, project / transform them onto new tuple streams, and feed them either as derived tables to other, higher-level SQL statements, or to your Java program.

If you’re using XML, you can declare XML transformation using XSLT and feed results to other XML processing entities, e.g. another XSL stylesheet, using XProc pipelining.

Java 8’s Streams are nothing else. Using SQL and the Streams API is one of the most powerful concepts for data processing. If you add jOOQ to the stack, you can profit from typesafe access to your database records and query APIs. Imagine writing the previous statement using jOOQ’s fluent API, instead of using SQL strings.

jooq-the-best-way-to-write-sql-in-java

The whole method chain could be one single fluent data transformation chain as such:

DSL.using(c)
   .select(
       COLUMNS.TABLE_NAME,
       COLUMNS.COLUMN_NAME,
       COLUMNS.TYPE_NAME
   )
   .from(COLUMNS)
   .orderBy(
       COLUMNS.TABLE_CATALOG,
       COLUMNS.TABLE_SCHEMA,
       COLUMNS.TABLE_NAME,
       COLUMNS.ORDINAL_POSITION
   )
   .fetch()  // jOOQ ends here
   .stream() // Streams start here
   .collect(groupingBy(
       r -> r.getValue(COLUMNS.TABLE_NAME),
       LinkedHashMap::new,
       mapping(
           r -> new Column(
               r.getValue(COLUMNS.COLUMN_NAME),
               r.getValue(COLUMNS.TYPE_NAME)
           ),
           toList()
       )
   ))
   .forEach(
       (table, columns) -> {
            // Just emit a CREATE TABLE statement
            System.out.println(
                "CREATE TABLE " + table + " (");

            // Map each "Column" type into a String
            // containing the column specification,
            // and join them using comma and
            // newline. Done!
            System.out.println(
                columns.stream()
                       .map(col -> "  " + col.name +
                                    " " + col.type)
                       .collect(Collectors.joining(",\n"))
            );

           System.out.println(");");
       }
   );

Java 8 is the future, and with jOOQ, Java 8, and the Streams API, you can write powerful data transformation APIs. I hope we got you as excited as we are! Stay tuned for more awesome Java 8 content on this blog.

Why Embracing Legacy is Wise

Legacy isn’t sexy. When hearing “legacy”, people think of COBOL. Those good old days when people talked to machines like they would talk to robots. But COBOL is not the only type of legacy. According to Stephen Ball, an Embarcadero salesman and evangelist whom I’ve met at the Java 2 Days conference in Sofia, Delphi has over 3 million developers world wide. Delphi! 3 millions!

Let’s take a moment to let this information seep in.

Delphi!? Three million developers!? tweet this

Yes. Delphi. I’m not sure if I should believe the actual figure, but it is ranked 20th on the TIOBE index as of February 2014. That’s much better than Go (35), Scala (39), Lisp (40) or Haskell (45), which form the “hipper” languages nowadays, which everyone seems to talk about.

Delphi, that language whose syntax looks like Pascal (17) or Ada (34). Or like PL/SQL (15), or a bit like T-SQL (11).

How does this relate to functional programming?

While many people claim functional programming to be “modern” (it’s not. Lisp surfaced in the 50ties), the bulk of popular languages are imperative, or object-oriented at best (which is just another way of structuring imperative programming). I’ve recently read an interesting article by Erkki Lindpere from the RebelLabs guys, claiming that there is a debate on object-oriented vs. functional programming and that it is about composition. His claim is that object oriented programming as any imperative programming leads to a lot of hard-to-compose state, whereas functional programming relies more heavily on immutable and thus composable values. That is quite true.

Yes, there is a debate in the functional corner. This becomes very clear when you post an article on reddit, which is dominated heavily by “functional people”. This also explained the amount of disagreement that we experienced when we published our article on LINQ vs. Java 8.

Stick with “the old ways”

But the fact is simple and clear. C (1) and Java (2) are the number one programming languages out there. Most code has been and is still being written in these languages. SQL dialects (11, 15) aren’t that far behind. While more modern languages like Scala nicely combine various paradigms into one, Java is incorporating the most popular features, slowly, with a 5-10 year lag. But then, things are done for good as Geert Bevin from RebelLabs recently stated in his article Reasons Why Java Rocks More Than Ever – Backwards Compatibility.

Jumping off the Java or SQL Ship is a bad idea, if you’re creating systems intended to last. Not all software is built for decades, true. But if yours is, you better use a technology that you know will stick around for another thirty years or so. Such as Delphi.