jOOQ Tuesdays: Mario Fusco Talks About Functional and Declarative Programming

Welcome to the jOOQ Tuesdays series. In this series, we’ll publish an article on the third Tuesday every other month where we interview someone we find exciting in our industry from a jOOQ perspective. This includes people who work with SQL, Java, Open Source, and a variety of other related topics.

mariofusco

I’m very excited to feature today Mario Fusco, author of LambdaJ, working on Red Hat’s drools, a Java Champion and frequent speaker at Java conferences on all topics functional programming.

Mario, a long time ago, I have already stumbled upon your name when looking up the author of Lambdaj – a library that went to the extreme to bring lambdas to Java 5 or earlier. How does it work? And what’s the most peculiar hack you implemented to make it work?

When I started developing Lambdaj in 2007 I thought to it just as a proof-of-concept to check how far I could push Java 5. I never expected that it could become something that somebody else other than myself may actually want to use. In reality, given the limited, or I should say non-existing, capabilities of Java 5 as a functional language, Lambdaj was entirely a big hack. Despite this, people started using and somewhat loving it, and this made me (and possibly somebody else) realize that Java developers, or at least part of them, were tired of the pure imperative paradigm imposed by the language and ready to experiment with something more functional.

The main feature of Lambdaj, and what made its DSL quite nice to use, was the possibility to reference the method of a class in a static and type safe way and pass it to another method. In this way you could for example sort a list of persons by their age doing something like:

sort(persons, on(Person.class).getAge());

As anticipated what happened under the hood was a big hack: the on() method created a proxy of the Person class so you could safely call the getAge() method on it. The proxy didn’t do anything useful other than registering the method call. However it had to return something of the same type of the value returned by the actual method to avoid a ClassCastException. To this purpose it had a mechanism to generate a reasonably unique instance of that type, an int in my example. Before returning that value it also associated it, using a WeakHashMap, to the invoked method. In this way the sort() method was actually invoked with a list and the value generated by my proxy. It then retrieved from the map the Java method associated with that value and invoked it on all the items of the list performing the operation, a sorting in this case, that it was supposed to execute.

That’s crazy :) I’m sure you’re happy that a lot of Lambdaj features are now deprecated. You’re now touring the world with your functional programming talks. What makes you so excited about this topic?

The whole Lambdaj project is now deprecated and abandoned. The new functional features introduced with Java 8 just made it obsolete. Nevertheless it not only had the merit to make developers become curious and interested about functional programming, but also to experiment with new patterns and ideas that in the end also influenced the Java 8 syntax. Take for instance how you can sort a Stream of persons by age using a method reference

persons.sort(Person::getAge)

It looks evident how the method references have been at least inspired by the Lambdaj‘s on() method.

There is a number of things that I love of functional programming:

  1. The readability: a snippet of code written in functional style looks like a story while too often the equivalent code in imperative style resembles a puzzle.
  2. The declarative nature: in functional programming is enough to declare the result that you want to achieve rather than specifying the steps to obtain it. You only care about the what without getting lost in the details of the how.
  3. The possibility of treating data and behaviors uniformly: functional programming allows you to pass to a method both data (the list of persons to be sorted) and computation (the function to be applied to each person in the list). This idea is fundamental for many algorithms like for example the map/reduce: since data and computation are the same thing and the second is typically orders of magnitude smaller you are free to send them to the machine holding the data instead of the opposite.
  4. The higher level of abstraction: the possibility of encapsulating computations in functions and pass them around to other functions allows both a dramatic reduction of code duplication and the design of more generic and expressive API.
  5. Immutability and referential transparency: using immutable values and having side-effects programs makes far easier to reason on your code, test it and ensure its correctness.
  6. The parallelism friendliness: all the features listed above also enable the parallelization of your software in a simpler and more reliable way. It is not coincidence that functional programming started becoming more popular around 10 years ago that is also when multicore CPUs began to be available on commodity hardware.

Our readers love SQL (or at least, they use it frequently). How does functional programming compare to SQL?

The most evident thing that FP and SQL have in common is their declarative paradigm. To some extent SQL, or at least the data selection part, can be seen as a functional language specialized to manipulate data in tabular format.

The data modification part is a totally different story though. The biggest part of SQL users normally change data in a destructive way, overwriting or even deleting the existing data. This is clearly in contrast with the immutability mantra of functional programming. However this is only how SQL is most commonly used, but nothing dictates that it couldn’t be also employed in a non-destructive append-only way. I wish to see SQL used more often in this way in future.

In your day job, you’re working for Red Hat, on drools. Business rules sound enterprisey. How does that get along with your fondness of functional programming?

Under an user point of view a rule engine in general and drools in particular are the extreme form of declarative programming, second only to Prolog. For this reason developers who are only familiar with the imperative paradigm struggle to use it, because they also try to enforce it to work in an imperative way. Conversely programmers more used to think in functional (and then declarative) terms are more often able to use it correctly when they approach it for the first time.

For what regards me, my work as developer of both the core engine and the compiler of drools allows me to experiment every day in both fields of language design and algorithmic invention and optimization. To cut it short it’s a challenging job and there’s lot’s of fun in it: don’t tell this to my employer but I cannot stop being surprised that they allow me to play with this everyday and they also pay me for that.

You’re also on the board of VoxxedDays Ticino, Zurich, and CERN (wow, how geeky is that? A large hadron collider Java conference!). Why is Voxxed such a big success for you?

I must admit that, before being involved in this, I didn’t imagine the amount of work that organizing a conference requires. However this effort is totally rewarded. In particular the great advantage of VoxxedDays is the fact of being local 1-day events made by developers for developers that practically anybody can afford.

I remember that the most common feedback I received after the first VoxxedDays Ticino that we did 2 years ago was some like: “This has been the very first conference I attended in my life and I didn’t imagine it could have been a so amazing experience both under a technical and even more a social point of view. Thanks a lot for that, I eagerly wait to attend even next year”. Can you imagine something more rewarding for a conference organizer?

The other important thing for me is giving the possibility to speakers that aren’t rock stars (yet) to talk in public and share their experience with a competent audience. I know that for at least some of them this is only the first step to let themselves and others discover their capabilities as public speakers and launch them toward bigger conferences like the Devoxx.

Thank you very much Mario

If you want to learn more about Mario’s insights on functional programming, please do visit his interesting talks at Devoxx from the recent past:

Using Oracle AQ via Java 8 Streams

One of the most awesome features of the Oracle database is Oracle AQ: Oracle Database Advanced Queuing. The AQ API implements a full fledged, transactional messaging system directly in the database.

In a classic architecture where the database is at the center of your system, with multiple applications (some of which written in Java, others written in Perl or PL/SQL, etc.) accessing the same database, using AQ for inter-process communication is just great. If you’re more on the Java EE side, you might purchase a Java-based MQ solution, and put that message bus / middleware at the center of your system architecture. But why not use the database instead?

How to use the PL/SQL AQ API with jOOQ

The PL/SQL API for AQ message enqueuing and dequeuing is rather simple, and it can be accessed very easily from Java using jOOQ’s OracleDSL.DBMS_AQ API.

The queue configuration used here would look something like:

CREATE OR REPLACE TYPE message_t AS OBJECT (
  ID         NUMBER(7),
  title      VARCHAR2(100 CHAR)
)
/

BEGIN
  DBMS_AQADM.CREATE_QUEUE_TABLE(
    queue_table => 'message_aq_t',
    queue_payload_type => 'message_t'
  );

  DBMS_AQADM.CREATE_QUEUE(
    queue_name => 'message_q',
    queue_table => 'message_aq_t'
  );

  DBMS_AQADM.START_QUEUE(
    queue_name => 'message_q'
  );
  COMMIT;
END;
/

And the jOOQ code generator would generate the useful classes with all the type information directly associated with them (simplified example):

class Queues {
    static final Queue<MessageTRecord> MESSAGE_Q = 
        new QueueImpl<>("NEW_AUTHOR_AQ", MESSAGE_T);
}

class MessageTRecord {
    void setId(Integer id) { ... }
    Integer getId() { ... }
    void setTitle(String title) { ... }
    String getTitle() { ... }
    MessageTRecord(
        Integer id, String title
    ) { ... }
}

These classes can then be used to enqueue and dequeue messages type safely directly on the generated queue references:

// The jOOQ configuration
Configuration c = ...

// Enqueue a message
DBMS_AQ.enqueue(c, MESSAGE_Q, 
    new MessageTRecord(1, "test"));

// Dequeue it again
MessageTRecord message = DBMS_AQ.dequeue(c, MESSAGE_Q);

Easy, isn’t it?

Now, let’s leverage Java 8 features

A message queue is nothing other than an infinite (blocking) stream of messages. Since Java 8, we have a formidable API for such message streams, the Stream API.

This is why we have added (for the upcoming jOOQ 3.8) a new API that combines the existing jOOQ AQ API with Java 8 Streams:

// The jOOQ configuration
Configuration c = ...

DBMS_AQ.dequeueStream(c, MESSAGE_Q)
       .filter(m -> "test".equals(m.getTitle()))
       .forEach(System.out::println);

The above stream pipeline will listen on the MESSAGE_Q queue, consume all messages, filter out messages that do not contain the "test", and print the remaining messages.

Blocking streams

The interesting thing is the fact that this is a blocking, infinite stream. As long as there is no new message in the queue, the stream pipeline processing will simply block on the queue, waiting for new messages. This is not an issue for sequential streams, but when calling Stream.parallel(), what happens then?

jOOQ will consume each message in a transaction. A jOOQ 3.8 transaction runs in a ForkJoinPool.ManagedBlocker:

static <T> Supplier<T> blocking(Supplier<T> supplier) {
    return new Supplier<T>() {
        volatile T result;

        @Override
        public T get() {
            try {
                ForkJoinPool.managedBlock(new ManagedBlocker() {
                    @Override
                    public boolean block() {
                        result = supplier.get();
                        return true;
                    }

                    @Override
                    public boolean isReleasable() {
                        return result != null;
                    }
                });
            }
            catch (InterruptedException e) {
                throw new RuntimeException(e);
            }

            return asyncResult;
        }
    };
}

This isn’t a lot of magic. A ManagedBlocker runs some special code when it is run by a ForkJoinWorkerThread, making sure the thread’s ForkJoinPool won’t suffer from thread exhaustion and thus from deadlocks. For more info, read this interesting article here:
http://zeroturnaround.com/rebellabs/java-parallel-streams-are-bad-for-your-health

Or this Stack Overflow answer:
http://stackoverflow.com/a/35272153/521799

So, if you want a super-fast parallel AQ dequeuing process, just run:

// The jOOQ configuration. Make sure its referenced
// ConnectionPool has enough connections
Configuration c = ...

DBMS_AQ.dequeueStream(c, MESSAGE_Q)
       .parallel()
       .filter(m -> "test".equals(m.getTitle()))
       .forEach(System.out::println);

And you’ll have several threads that will dequeue messages in parallel.

Don’t want to wait for jOOQ 3.8?

No problem. Use the current version and wrap the dequeue operation in your own Stream:

Stream<MessageTRecord> stream = Stream.generate(() ->
    DSL.using(config).transactionResult(c ->
        dequeue(c, MESSAGE_Q)
    )
);

Done.

Bonus: Asynchronous dequeuing

While we were at it, another very nice feature of queuing systems is their asynchronicity. With Java 8, a very useful type to model (and compose) asynchronous algorithms is the CompletionStage, and it’s default implementation the CompletableFuture, which executes tasks in the ForkJoinPool again.

Using jOOQ 3.8, you can again simply call

// The jOOQ configuration. Make sure its referenced
// ConnectionPool has enough connections
Configuration c = ...

CompletionStage<MessageTRecord> stage =
DBMS_AQ.dequeueAsync(c, MESSAGE_Q)
       .thenCompose(m -> ...)
       ...;

Stay tuned for another article on the jOOQ blog soon, where we look into more sophisticated use-cases for asynchronous, blocking SQL statements with jOOQ 3.8 and Java 8

The Danger of Subtype Polymorphism Applied to Tuples

Java 8 has lambdas and streams, but no tuples, which is a shame. This is why we have implemented tuples in jOOλ – Java 8’s missing parts. Tuples are really boring value type containers. Essentially, they’re just an enumeration of types like these:

public class Tuple2<T1, T2> {
    public final T1 v1;
    public final T2 v2;

    public Tuple2(T1 v1, T2 v2) {
        this.v1 = v1;
        this.v2 = v2;
    }

    // [...]
}


public class Tuple3<T1, T2, T3> {
    public final T1 v1;
    public final T2 v2;
    public final T3 v3;

    public Tuple3(T1 v1, T2 v2, T3 v3) {
        this.v1 = v1;
        this.v2 = v2;
        this.v3 = v3;
    }

    // [...]
}

Writing tuple classes is a very boring task, and it’s best done using a source code generator.

Tuples in other languages and APIs

jOOλ‘s current version features tuples of degrees 0 – 16. C# and other .NET languages have tuple types between 1 – 8. There’s a special library just for tuples called Javatuples with tuples between degrees 1 and 10, and the authors went the extra mile and gave the tuples individual English names:

Unit<A> // (1 element)
Pair<A,B> // (2 elements)
Triplet<A,B,C> // (3 elements)
Quartet<A,B,C,D> // (4 elements)
Quintet<A,B,C,D,E> // (5 elements)
Sextet<A,B,C,D,E,F> // (6 elements)
Septet<A,B,C,D,E,F,G> // (7 elements)
Octet<A,B,C,D,E,F,G,H> // (8 elements)
Ennead<A,B,C,D,E,F,G,H,I> // (9 elements)
Decade<A,B,C,D,E,F,G,H,I,J> // (10 elements)

Why?

because Ennead really rings that sweet bell when I see it

Last, but not least, jOOQ also has a built-in tuple-like type, the org.jooq.Record, which serves as a base type for nice subtypes like Record7<T1, T2, T3, T4, T5, T6, T7>. jOOQ follows Scala and defines records up to a degree of 22.

Watch out when defining tuple type hierarchies

As we have seen in the previous example, Tuple3 has much code in common with Tuple2.

As we’re all massively brain-damaged by decades of object orientation and polymorphic design anti-patters, we might think that it would be a good idea to let Tuple3<T1, T2, T3> extend Tuple2<T1, T2>, as Tuple3 just adds one more attribute to the right of Tuple2, right? So…

public class Tuple3<T1, T2, T3> extends Tuple2<T1, T2> {
    public final T3 v3;

    public Tuple3(T1 v1, T2 v2, T3 v3) {
        super(v1, v2);
        this.v3 = v3;
    }

    // [...]
}

The truth is: That’s about the worst thing you could do, for several reasons. First off, yes. Both Tuple2 and Tuple3 are tuples, so they do have some common features. It’s not a bad idea to group those features in a common super type, such as:

public class Tuple2<T1, T2> implements Tuple {
    // [...]
}

But the degree is not one of those things. Here’s why:

Permutations

Think about all the possible tuples that you can form. If you let tuples extend each other, then a Tuple5 would also be assignment-compatible with a Tuple2, for instance. The following would compile perfectly:

Tuple2<String, Integer> t2 = tuple("A", 1, 2, 3, "B");

When letting Tuple3 extend Tuple2, it may have seemed like a good default choice to just drop the right-most attribute from the tuple in the extension chain.

But in the above example, why don’t I want to re-assign (v2, v4) such that the result is (1, 3), or maybe (v1, v3), such that the result is ("A", 2)?

There are a tremendous amount of permutations of possible attributes that could be of interest when “reducing” a higher degree tuple to a lower degree one. No way a default of dropping the right-most attribute will be sufficiently general for all use-cases

Type systems

Much worse than the above, there would be drastic implications for the type system, if Tuple3 extended Tuple2. Check out the jOOQ API, for instance. In jOOQ, you can safely assume the following:

// Compiles:
TABLE1.COL1.in(select(TABLE2.COL1).from(TABLE2))

// Must not compile:
TABLE1.COL1.in(select(TABLE2.COL1, TABLE2.COL2).from(TABLE2))

The first IN predicate is correct. The left hand side of the predicate has a single column (as opposed to being a row value expression). This means that the right hand side of the predicate must also operate on single-column expressions, e.g. a SELECT subquery that selects a single column (of the same type).

The second example selects too many columns, and the jOOQ API will tell the Java compiler that this is wrong.

This is guaranteed by jOOQ via the Field.in(Select) method, whose signature reads:

public interface Field<T> {
    ...
    Condition in(Select<? extends Record1<T>> select);
    ...
}

So, you can provide a SELECT statement that produces any subtype of the Record1<T> type.

Luckily, Record2 does not extend Record1

If now Record2 extended Record1, which might have seemed like a good idea at first, the second query would suddenly compile:

// This would now compile
TABLE1.COL1.in(select(TABLE2.COL1, TABLE2.COL2).from(TABLE2))

… even if it forms an invalid SQL statement. It would compile because it would generate a Select<Record2<Type1, Type2>> type, which would be a subtype of the expected Select<Record1<Type1>> from the Field.in(Select) method.

Conclusion

Tuple2 and Tuple5 types are fundamentally incompatible types. In strong type systems, you mustn’t be lured into thinking that similar types, or related types should also be compatible types.

Type hierarchies are something very object-oriented, and by object-oriented, I mean the flawed and over-engineered notion of object orientation that we’re still suffering from since the 90s. Even in “the Enterprise”, most people have learned to favour Composition over Inheritance. Composition in the case of tuples means that you can well transform a Tuple5 to a Tuple2. But you cannot assign it.

In jOOλ, such a transformation can be done very easily as follows:

// Produces (1, 3)
Tuple2<String, Integer> t2_4 = 
    tuple("A", 1, 2, 3, "B")
    .map((v1, v2, v3, v4, v5) -> tuple(v2, v4));

// Produces ("A", 2)
Tuple2<String, Integer> t1_3 = 
    tuple("A", 1, 2, 3, "B")
    .map((v1, v2, v3, v4, v5) -> tuple(v1, v3));

The idea is that you operate on immutable values, and that you can easily extract parts of those values and map / recombine them to new values.

Read more

If you’ve enjoyed reading this article, you might also like to learn why recursive generics are a terrible idea (in many situations).

Common SQL Clauses and Their Equivalents in Java 8 Streams

Functional programming allows for quasi-declarative programming in a general purpose language. By using powerful fluent APIs like Java 8’s Stream API, or jOOλ’s sequential Stream extension Seq or more sophisticated libraries like vavr or functionaljava, we can express data transformation algorithms in an extremely concise way. Compare Mario Fusco’s imperative and functional version of the same algorithm:

Using such APIs, functional programming certainly feels like true declarative programming.

The most popular true declarative programming language is SQL. When you join two tables, you don’t tell the RDBMS how to implement that join. It may decide at its discretion whether a nested loop, merge join, hash join, or some other algorithm is the most suitable in the context of the complete query and of all the available meta information. This is extremely powerful because the performance assumptions that are valid for a simple join may no longer be valid for a complex one, where a different algorithm would outperform the original one. By this abstraction, you can just easily modify a query in 30 seconds, without worrying about low-level details like algorithms or performance.

When an API allows you to combine both (e.g. jOOQ and Streams), you will get the best of both worlds – and those worlds aren’t too different.

In the following sections, we’ll compare common SQL constructs with their equivalent expressions written in Java 8 using Streams and jOOλ, in case the Stream API doesn’t offer enough functionality.

Tuples

For the sake of this article, we’re going to assume that SQL rows / records have an equivalent representation in Java. For this, we’ll be using jOOλ’s Tuple type, which is essentially:

public class Tuple2<T1, T2> {

    public final T1 v1;
    public final T2 v2;

    public Tuple2(T1 v1, T2 v2) {
        this.v1 = v1;
        this.v2 = v2;
    }
}

… plus a lot of useful gimmicks like Tuple being Comparable, etc.

Note that we’re assuming the following imports in this and all subsequent examples.

import static org.jooq.lambda.Seq.*;
import static org.jooq.lambda.tuple.Tuple.*;

import java.util.*;
import java.util.function.*;
import java.util.stream.*;

import org.jooq.lambda.*;

Much like SQL rows, a tuple is a “value-based” type, meaning that it doesn’t really have an identity. Two tuples (1, 'A') and (1, 'A') can be considered exactly equivalent. Removing identity from the game makes SQL and functional programming with immutable data structures extremely elegant.

FROM = of(), stream(), etc.

In SQL, the FROM clause logically (but not syntactically) precedes all the other clauses. It is used to produce a set of tuples from at least one table, possibly multiple joined tables. A single-table FROM clause can be trivially mapped to Stream.of(), for instance, or to any other method that simply produces a stream:

SQL

SELECT *
FROM (
  VALUES(1, 1),
        (2, 2)
) t(v1, v2)

yielding

+----+----+
| v1 | v2 |
+----+----+
|  1 |  1 |
|  2 |  2 |
+----+----+

Java

Stream.of(
  tuple(1, 1),
  tuple(2, 2)
).forEach(System.out::println);

yielding

(1, 1)
(2, 2)

CROSS JOIN = flatMap()

Selecting from multiple tables is already more interesting. The easiest way to combine two tables in SQL is by producing a cartesian product, either via a table list or using a CROSS JOIN. The following two are equivalent SQL statements:

SQL

-- Table list syntax
SELECT *
FROM (VALUES( 1 ), ( 2 )) t1(v1), 
     (VALUES('A'), ('B')) t2(v2)

-- CROSS JOIN syntax
SELECT *
FROM       (VALUES( 1 ), ( 2 )) t1(v1)
CROSS JOIN (VALUES('A'), ('B')) t2(v2)

yielding

+----+----+
| v1 | v2 |
+----+----+
|  1 |  A |
|  1 |  B |
|  2 |  A |
|  2 |  B |
+----+----+

In a cross join (or cartesian product), every value from t1 is combined with every value from t2 producing size(t1) * size(t2) rows in total.

Java

In functional programming using Java 8’s Stream, the Stream.flatMap() method corresponds to SQL CROSS JOIN as can be seen in the following example:

List<Integer> s1 = Stream.of(1, 2);
Supplier<Stream<String>> s2 = ()->Stream.of("A", "B");

s1.flatMap(v1 -> s2.get()
                   .map(v2 -> tuple(v1, v2)))
  .forEach(System.out::println);

yielding

(1, A)
(1, B)
(2, A)
(2, B)

Note how we have to wrap the second stream in a Supplier because streams can be consumed only once, but the above algorithm is really implementing a nested loop, combining all elements of stream s2 with each element from stream s1. An alternative would be not to use streams but lists (which we will do in subsequent examples, for simplicity):

List<Integer> s1 = Arrays.asList(1, 2);
List<String> s2 = Arrays.asList("A", "B");

s1.stream()
  .flatMap(v1 -> s2.stream()
                   .map(v2 -> tuple(v1, v2)))
  .forEach(System.out::println);

In fact, CROSS JOIN can be chained easily both in SQL and in Java:

SQL

-- Table list syntax
SELECT *
FROM (VALUES( 1 ), ( 2 )) t1(v1), 
     (VALUES('A'), ('B')) t2(v2), 
     (VALUES('X'), ('Y')) t3(v3)

-- CROSS JOIN syntax
SELECT *
FROM       (VALUES( 1 ), ( 2 )) t1(v1)
CROSS JOIN (VALUES('A'), ('B')) t2(v2)
CROSS JOIN (VALUES('X'), ('Y')) t3(v3)

yielding

+----+----+----+
| v1 | v2 | v3 |
+----+----+----+
|  1 |  A |  X |
|  1 |  A |  Y |
|  1 |  B |  X |
|  1 |  B |  Y |
|  2 |  A |  X |
|  2 |  A |  Y |
|  2 |  B |  X |
|  2 |  B |  Y |
+----+----+----+

Java

List<Integer> s1 = Arrays.asList(1, 2);
List<String> s2 = Arrays.asList("A", "B");
List<String> s3 = Arrays.asList("X", "Y");

s1.stream()
  .flatMap(v1 -> s2.stream()
                   .map(v2 -> tuple(v1, v2)))
  .flatMap(v12-> s3.stream()
                   .map(v3 -> tuple(v12.v1, v12.v2, v3)))
  .forEach(System.out::println);

yielding

(1, A, X)
(1, A, Y)
(1, B, X)
(1, B, Y)
(2, A, X)
(2, A, Y)
(2, B, X)
(2, B, Y)

Note how we explicitly unnested the tuples from the first CROSS JOIN operation to form “flat” tuples in the second operation. This is optional, of course.

Java with jOOλ’s crossJoin()

Us jOOQ developers, we’re a very SQL-oriented people, so it is only natural to have added a crossJoin() convenience method for the above use-case. So our triple-cross join can be written like this:

Seq<Integer> s1 = Seq.of(1, 2);
Seq<String> s2 = Seq.of("A", "B");
Seq<String> s3 = Seq.of("X", "Y");

s1.crossJoin(s2)
  .crossJoin(s3)
  .forEach(System.out::println);

yielding

((1, A), X)
((1, A), Y)
((1, B), X)
((1, B), Y)
((2, A), X)
((2, A), Y)
((2, B), X)
((2, B), Y)

In this case, we didn’t unnest the tuple produced in the first cross join. From a merely relational perspective, this doesn’t matter either. Nested tuples are the same thing as flat tuples. In SQL, we just don’t see the nesting. Of course, we could still unnest as well by adding a single additional mapping:

Seq<Integer> s1 = Seq.of(1, 2);
Seq<String> s2 = Seq.of("A", "B");
Seq<String> s3 = Seq.of("X", "Y");

s1.crossJoin(s2)
  .crossJoin(s3)
  .map(t -> tuple(t.v1.v1, t.v1.v2, t.v2))
  .forEach(System.out::println);

yielding, again

(1, A, X)
(1, A, Y)
(1, B, X)
(1, B, Y)
(2, A, X)
(2, A, Y)
(2, B, X)
(2, B, Y)

(You may have noticed that map() corresponds to SELECT as we’ll see again later on)

INNER JOIN = flatMap() with filter()

The SQL INNER JOIN is essentially just syntactic sugar for a SQL CROSS JOIN with a predicate that reduces the tuple set after cross-joining. In SQL, the following two ways of inner joining are equivalent:

SQL

-- Table list syntax
SELECT *
FROM (VALUES(1), (2)) t1(v1), 
     (VALUES(1), (3)) t2(v2)
WHERE t1.v1 = t2.v2

-- INNER JOIN syntax
SELECT *
FROM       (VALUES(1), (2)) t1(v1)
INNER JOIN (VALUES(1), (3)) t2(v2)
ON t1.v1 = t2.v2

yielding

+----+----+
| v1 | v2 |
+----+----+
|  1 |  1 |
+----+----+

(note that the keyword INNER is optional).

So, the values 2 from t1 and the values 3 from t2 are “thrown away”, as they produce any rows for which the join predicate yields true.

The same can be expressed easily, yet more verbosely in Java

Java (inefficient solution!)

List<Integer> s1 = Arrays.asList(1, 2);
List<Integer> s2 = Arrays.asList(1, 3);

s1.stream()
  .flatMap(v1 -> s2.stream()
                   .map(v2 -> tuple(v1, v2)))
  .filter(t -> Objects.equals(t.v1, t.v2))
  .forEach(System.out::println);

The above correctly yields

(1, 1)

But beware that you’re attaining this result after producing a cartesian product, the nightmare of every DBA! As mentioned at the beginning of this article, unlike in declarative programming, in functional programming you instruct your program to do exactly the order of operations that you specify. In other words:

In functional programming, you define the exact “execution plan” of your query.

In declarative programming, an optimiser may reorganise your “program”

There is no optimiser to transform the above into the much more efficient:

Java (more efficient)

List<Integer> s1 = Arrays.asList(1, 2);
List<Integer> s2 = Arrays.asList(1, 3);

s1.stream()
  .flatMap(v1 -> s2.stream()
                   .filter(v2 -> Objects.equals(v1, v2))
                   .map(v2 -> tuple(v1, v2)))
  .forEach(System.out::println);

The above also yields

(1, 1)

Notice, how the join predicate has moved from the “outer” stream into the “inner” stream, that is produced in the function passed to flatMap().

Java (optimal)

As mentioned previously, functional programming doesn’t necessarily allow you to rewrite algorithms depending on knowledge of the actual data. The above presented implementation for joins always implement nested loop joins going from the first stream to the second. If you join more than two streams, or if the second stream is very large, this approach can be terribly inefficient. A sophisticated RDBMS would never blindly apply nested loop joins like that, but consider constraints, indexes, and histograms on actual data.

Going deeper into that topic would be out of scope for this article, though.

Java with jOOλ’s innerJoin()

Again, inspired by our work on jOOQ we’ve also added an innerJoin() convenience method for the above use-case:

Seq<Integer> s1 = Seq.of(1, 2);
Seq<Integer> s2 = Seq.of(1, 3);

s1.innerJoin(s2, (t, u) -> Objects.equals(t, u))
  .forEach(System.out::println);

yielding

(1, 1)

… because after all, when joining two streams, the only really interesting operation is the join Predicate. All else (flatmapping, etc.) is just boilerplate.

LEFT OUTER JOIN = flatMap() with filter() and a “default”

SQL’s OUTER JOIN works like INNER JOIN, except that additional “default” rows are produced in case the JOIN predicate yields false for a pair of tuples. In terms of set theory / relational algebra, this can be expressed as such:

dd81ee1373d922122ce1b3e0da74cb28

Or in a SQL-esque dialect:

R LEFT OUTER JOIN S ::=

R INNER JOIN S
UNION (
  (R EXCEPT (SELECT R.* FROM R INNER JOIN S))
  CROSS JOIN
  (null, null, ..., null)
)

This simply means that when left outer joining S to R, there will be at least one row in the result for each row in R, with possibly an empty value for S.

Inversely, when right outer joining S to R, there will be at least one row in the result for each row in S, with possibly an empty value for R.

And finally, when full outer joining S to R, there will be at least one row in the result for each row in R with possibly an empty value for S AND for each row in S with possibly an empty value for R.

Let us look at LEFT OUTER JOIN, which is used most often in SQL.

SQL

-- Table list, Oracle syntax (don't use this!)
SELECT *
FROM (SELECT 1 v1 FROM DUAL
      UNION ALL 
      SELECT 2 v1 FROM DUAL) t1, 
     (SELECT 1 v2 FROM DUAL
      UNION ALL
      SELECT 3 v2 FROM DUAL) t2
WHERE t1.v1 = t2.v2 (+)

-- OUTER JOIN syntax
SELECT *
FROM            (VALUES(1), (2)) t1(v1)
LEFT OUTER JOIN (VALUES(1), (3)) t2(v2)
ON t1.v1 = t2.v2

yielding

+----+------+
| v1 |   v2 |
+----+------+
|  1 |    1 |
|  2 | null |
+----+------+

(note that the keyword OUTER is optional).

Java

Unfortunately, the JDK’s Stream API doesn’t provide us with an easy way to produce “at least” one value from a stream, in case the stream is empty. We could be writing a utility function as explained by Stuart Marks on Stack Overflow:

static <T> Stream<T> defaultIfEmpty(
    Stream<T> stream, Supplier<T> supplier) {
    Iterator<T> iterator = stream.iterator();

    if (iterator.hasNext()) {
        return StreamSupport.stream(
            Spliterators.spliteratorUnknownSize(
                iterator, 0
            ), false);
    } else {
        return Stream.of(supplier.get());
    }
}

Or, we just use jOOλ’s Seq.onEmpty()

List<Integer> s1 = Arrays.asList(1, 2);
List<Integer> s2 = Arrays.asList(1, 3);

seq(s1)
.flatMap(v1 -> seq(s2)
              .filter(v2 -> Objects.equals(v1, v2))
              .onEmpty(null)
              .map(v2 -> tuple(v1, v2)))
.forEach(System.out::println);

(notice, we’re putting null in a stream. This might not always be a good idea. We’ll follow up with that in a future blog post)

The above also yields

(1, 1)
(2, null)

How to read the implicit left outer join?

  • We’ll take each value v1 from the left stream s1
  • For each such value v1, we flatmap the right stream s2 to produce a tuple (v1, v2) (a cartesian product, cross join)
  • We’ll apply the join predicate for each such tuple (v1, v2)
  • If the join predicate leaves no tuples for any value v2, we’ll generate a single tuple containing the value of the left stream v1 and null

Java with jOOλ

For convenience, jOOλ also supports leftOuterJoin() which works as described above:

Seq<Integer> s1 = Seq.of(1, 2);
Seq<Integer> s2 = Seq.of(1, 3);

s1.leftOuterJoin(s2, (t, u) -> Objects.equals(t, u))
  .forEach(System.out::println);

yielding

(1, 1)
(2, null)

RIGHT OUTER JOIN = inverse LEFT OUTER JOIN

Trivially, a RIGHT OUTER JOIN is just the inverse of the previous LEFT OUTER JOIN. The jOOλ implementation of rightOuterJoin() looks like this:

default <U> Seq<Tuple2<T, U>> rightOuterJoin(
    Stream<U> other, BiPredicate<T, U> predicate) {
    return seq(other)
          .leftOuterJoin(this, (u, t) -> predicate.test(t, u))
          .map(t -> tuple(t.v2, t.v1));
}

As you can see, the RIGHT OUTER JOIN inverses the results of a LEFT OUTER JOIN, that’s it. For example:

Seq<Integer> s1 = Seq.of(1, 2);
Seq<Integer> s2 = Seq.of(1, 3);

s1.rightOuterJoin(s2, (t, u) -> Objects.equals(t, u))
  .forEach(System.out::println);

yielding

(1, 1)
(null, 3)

WHERE = filter()

The most straight-forward mapping is probably SQL’s WHERE clause having an exact equivalent in the Stream API: Stream.filter().

SQL

SELECT *
FROM (VALUES(1), (2), (3)) t(v)
WHERE v % 2 = 0

yielding

+---+
| v |
+---+
| 2 |
+---+

Java

Stream<Integer> s = Stream.of(1, 2, 3);

s.filter(v -> v % 2 == 0)
 .forEach(System.out::println);

yielding

2

The interesting thing with filter() and the Stream API in general is that the operation can apply at any place in the call chain, unlike the WHERE clause, which is limited to be placed right after the FROM clause – even if SQL’s JOIN .. ON or HAVING clauses are semantically similar.

GROUP BY = collect()

The least straight-forward mapping is GROUP BY vs. Stream.collect().

First off, SQL’s GROUP BY may be a bit tricky to fully understand. It is really part of the FROM clause, transforming the set of tuples produced by FROM .. JOIN .. WHERE into groups of tuples, where each group has an associated set of aggregatable tuples, which can be aggregated in the HAVING, SELECT, and ORDER BY clauses. Things get even more interesting when you use OLAP features like GROUPING SETS, which allow for duplicating tuples according to several grouping combinations.

In most SQL implementations that don’t support ARRAY or MULTISET, the aggregatable tuples are not available as such (i.e. as nested collections) in the SELECT. Here, the Stream API’s feature set excels. On the other hand, the Stream API can group values only as a terminal operation, where in SQL, GROUP BY is applied purely declaratively (and thus, lazily). The execution planner may choose not to execute the GROUP BY at all if it is not needed. For instance:

SELECT *
FROM some_table
WHERE EXISTS (
    SELECT x, sum(y)
    FROM other_table
    GROUP BY x
)

The above query is semantically equivalent to

SELECT *
FROM some_table
WHERE EXISTS (
    SELECT 1
    FROM other_table
)

The grouping in the subquery was unnecessary. Someone may have copy-pasted that subquery in there from somewhere else, or refactored the query as a whole. In Java, using the Stream API, each operation is always executed.

For the sake of simplicity, we’ll stick to the most simple examples here

Aggregation without GROUP BY

A special case is when we do not specify any GROUP BY clause. In that case, we can specify aggregations on all columns of the FROM clause, producing always exactly one record. For instance:

SQL

SELECT sum(v)
FROM (VALUES(1), (2), (3)) t(v)

yielding

+-----+
| sum |
+-----+
|   6 |
+-----+

Java

Stream<Integer> s = Stream.of(1, 2, 3);

int sum = s.collect(Collectors.summingInt(i -> i));
System.out.println(sum);

yielding

6

Aggregation with GROUP BY

A more common case of aggregation in SQL is to specify an explicit GROUP BY clause as explained before. For instance, we may want to group by even and odd numbers:

SQL

SELECT v % 2, count(v), sum(v)
FROM (VALUES(1), (2), (3)) t(v)
GROUP BY v % 2

yielding

+-------+-------+-----+
| v % 2 | count | sum |
+-------+-------+-----+
|     0 |     1 |   2 |
|     1 |     2 |   4 |
+-------+-------+-----+

Java

For this simple grouping / collection use-case, luckily, the JDK offers a utility method called Collectors.groupingBy(), which produces a collector that generates a Map<K, List<V>> type like this:

Stream<Integer> s = Stream.of(1, 2, 3);

Map<Integer, List<Integer>> map = s.collect(
    Collectors.groupingBy(v -> v % 2)
);

System.out.println(map);

yielding

{0=[2], 1=[1, 3]}

This certainly takes care of the grouping. Now we want to produce aggregations for each group. The slightly awkward JDK way to do this would be:

Stream<Integer> s = Stream.of(1, 2, 3);

Map<Integer, IntSummaryStatistics> map = s.collect(
    Collectors.groupingBy(
        v -> v % 2,
        Collectors.summarizingInt(i -> i)
    )
);

System.out.println(map);

we’ll now get:

{0=IntSummaryStatistics{count=1, sum=2, min=2, average=2.000000, max=2},
 1=IntSummaryStatistics{count=2, sum=4, min=1, average=2.000000, max=3}}

As you can see, the count() and sum() values have been calculated somewhere along the lines of the above.

More sophisticated GROUP BY

When doing multiple aggregations with Java 8’s Stream API, you will quickly be forced to wrestle low-level API implementing complicated collectors and accumulators yourself. This is tedious and unnecessary. Consider the following SQL statement:

SQL

CREATE TABLE t (
  w INT,
  x INT,
  y INT,
  z INT
);

SELECT
    z, w, 
    MIN(x), MAX(x), AVG(x), 
    MIN(y), MAX(y), AVG(y) 
FROM t
GROUP BY z, w;

In one go, we want to:

  • Group by several values
  • Aggregate from several values

Java

In a previous article, we’ve explained in detail how this can be achieved using convenience API from jOOλ via Seq.groupBy()

class A {
    final int w;
    final int x;
    final int y;
    final int z;
 
    A(int w, int x, int y, int z) {
        this.w = w;
        this.x = x;
        this.y = y;
        this.z = z;
    }
}

Map<
    Tuple2<Integer, Integer>, 
    Tuple2<IntSummaryStatistics, IntSummaryStatistics>
> map =
Seq.of(
    new A(1, 1, 1, 1),
    new A(1, 2, 3, 1),
    new A(9, 8, 6, 4),
    new A(9, 9, 7, 4),
    new A(2, 3, 4, 5),
    new A(2, 4, 4, 5),
    new A(2, 5, 5, 5))
 
// Seq.groupBy() is just short for 
// Stream.collect(Collectors.groupingBy(...))
.groupBy(
    a -> tuple(a.z, a.w),
 
    // ... because once you have tuples, 
    // why not add tuple-collectors?
    Tuple.collectors(
        Collectors.summarizingInt(a -> a.x),
        Collectors.summarizingInt(a -> a.y)
    )
);

System.out.println(map);

The above yields

{(1, 1)=(IntSummaryStatistics{count=2, sum=3, min=1, average=1.500000, max=2},
         IntSummaryStatistics{count=2, sum=4, min=1, average=2.000000, max=3}),
 (4, 9)=(IntSummaryStatistics{count=2, sum=17, min=8, average=8.500000, max=9},
         IntSummaryStatistics{count=2, sum=13, min=6, average=6.500000, max=7}),
 (5, 2)=(IntSummaryStatistics{count=3, sum=12, min=3, average=4.000000, max=5},
         IntSummaryStatistics{count=3, sum=13, min=4, average=4.333333, max=5})}

For more details, read the full article here.

Notice how using Stream.collect(), or Seq.groupBy() already makes for an implicit SELECT clause, which we are no longer needed to obtain via map() (see below).

HAVING = filter(), again

As mentioned before, there aren’t really different ways of applying predicates with the Stream API, there is only Stream.filter(). In SQL, HAVING is a “special” predicate clause that is syntactically put after the GROUP BY clause. For instance:

SQL

SELECT v % 2, count(v)
FROM (VALUES(1), (2), (3)) t(v)
GROUP BY v % 2
HAVING count(v) > 1

yielding

+-------+-------+
| v % 2 | count |
+-------+-------+
|     1 |     2 |
+-------+-------+

Java

Unfortunately, as we have seen before, collect() is a terminal operation in the Stream API, which means that it eagerly produces a Map, instead of transforming the Stream<T> into a Stream<K, Stream<V>, which would compose much better in complex Stream. This means that any operation that we’d like to implement right after collecting will have to be implemented on a new stream produced from the output Map:

Stream<Integer> s = Stream.of(1, 2, 3);

s.collect(Collectors.groupingBy(
      v -> v % 2,
      Collectors.summarizingInt(i -> i)
  ))
  .entrySet()
  .stream()
  .filter(e -> e.getValue().getCount() > 1)
  .forEach(System.out::println);

yielding

1=IntSummaryStatistics{count=2, sum=4, min=1, average=2.000000, max=3}

As you can see, the type transformation that is applied is:

  • Map<Integer, IntSummaryStatistics>
  • Set<Entry<Integer, IntSummaryStatistics>>
  • Stream<Entry<Integer, IntSummaryStatistics>>

SELECT = map()

The SELECT clause in SQL is nothing more than a tuple transformation function that takes the cartesian product of tuples produced by the FROM clause and transforms it into a new tuple expression, which is fed either to the client, or to some higher-level query if this is a nested SELECT. An illustration:

FROM output

+------+------+------+------+------+
| T1.A | T1.B | T1.C | T2.A | T2.D |
+------+------+------+------+------+
|    1 |    A |    a |    1 |    X |
|    1 |    B |    b |    1 |    Y |
|    2 |    C |    c |    2 |    X |
|    2 |    D |    d |    2 |    Y |
+------+------+------+------+------+

Applying SELECT

SELECT t1.a, t1.c, t1.b || t1.d

+------+------+--------------+
| T1.A | T1.C | T1.B || T1.D |
+------+------+--------------+
|    1 |    a |           AX |
|    1 |    b |           BY |
|    2 |    c |           CX |
|    2 |    d |           DY |
+------+------+--------------+

Using Java 8 Streams, SELECT can be achieved very simply by using Stream.map(), as we’ve already seen in previous examples, where we unnested tuples using map(). The following examples are functionally equivalent:

SQL

SELECT t.v1 * 3, t.v2 + 5
FROM (
  VALUES(1, 1),
        (2, 2)
) t(v1, v2)

yielding

+----+----+
| c1 | c2 |
+----+----+
|  3 |  6 |
|  6 |  7 |
+----+----+

Java

Stream.of(
  tuple(1, 1),
  tuple(2, 2)
).map(t -> tuple(t.v1 * 3, t.v2 + 5))
 .forEach(System.out::println);

yielding

(3, 6)
(6, 7)

DISTINCT = distinct()

The DISTINCT keyword that can be supplied with the SELECT clause simply removes duplicate tuples right after they have been produced by the SELECT clause. An illustration:

FROM output

+------+------+------+------+------+
| T1.A | T1.B | T1.C | T2.A | T2.D |
+------+------+------+------+------+
|    1 |    A |    a |    1 |    X |
|    1 |    B |    b |    1 |    Y |
|    2 |    C |    c |    2 |    X |
|    2 |    D |    d |    2 |    Y |
+------+------+------+------+------+

Applying SELECT DISTINCT

SELECT DISTINCT t1.a

+------+
| T1.A |
+------+
|    1 |
|    2 |
+------+

Using Java 8 Streams, SELECT DISTINCT can be achieved very simply by using Stream.distinct() right after Stream.map(). The following examples are functionally equivalent:

SQL

SELECT DISTINCT t.v1 * 3, t.v2 + 5
FROM (
  VALUES(1, 1),
        (2, 2),
        (2, 2)
) t(v1, v2)

yielding

+----+----+
| c1 | c2 |
+----+----+
|  3 |  6 |
|  6 |  7 |
+----+----+

Java

Stream.of(
  tuple(1, 1),
  tuple(2, 2),
  tuple(2, 2)
).map(t -> tuple(t.v1 * 3, t.v2 + 5))
 .distinct()
 .forEach(System.out::println);

yielding

(3, 6)
(6, 7)

UNION ALL = concat()

Set operations are powerful both in SQL and using the Stream API. The UNION ALL operation maps to Stream.concat(), as can be seen below:

SQL

SELECT *
FROM (VALUES(1), (2)) t(v)
UNION ALL
SELECT *
FROM (VALUES(1), (3)) t(v)

yielding

+---+
| v |
+---+
| 1 |
| 2 |
| 1 |
| 3 |
+---+

Java

Stream<Integer> s1 = Stream.of(1, 2);
Stream<Integer> s2 = Stream.of(1, 3);

Stream.concat(s1, s2)
      .forEach(System.out::println);

yielding

1
2
1
3

Java (using jOOλ)

Unfortunately, concat() exists in Stream only as a static method, while Seq.concat() also exists on instances when working with jOOλ.

Seq<Integer> s1 = Seq.of(1, 2);
Seq<Integer> s2 = Seq.of(1, 3);

s1.concat(s2)
  .forEach(System.out::println);

UNION = concat() and distinct()

In SQL, UNION is defined to remove duplicates after concatenating the two sets via UNION ALL. The following two statements are equivalent:

SELECT * FROM t
UNION
SELECT * FROM u;

-- equivalent

SELECT DISTINCT *
FROM (
  SELECT * FROM t
  UNION ALL
  SELECT * FROM u
);

Let’s put this in action:

SQL

SELECT *
FROM (VALUES(1), (2)) t(v)
UNION
SELECT *
FROM (VALUES(1), (3)) t(v)

yielding

+---+
| v |
+---+
| 1 |
| 2 |
| 3 |
+---+

Java

Stream<Integer> s1 = Stream.of(1, 2);
Stream<Integer> s2 = Stream.of(1, 3);

Stream.concat(s1, s2)
      .distinct()
      .forEach(System.out::println);

ORDER BY = sorted()

The ORDER BY mapping is trivial

SQL

SELECT *
FROM (VALUES(1), (4), (3)) t(v)
ORDER BY v

yielding

+---+
| v |
+---+
| 1 |
| 3 |
| 4 |
+---+

Java

Stream<Integer> s = Stream.of(1, 4, 3);

s.sorted()
 .forEach(System.out::println);

yielding

1
3
4

LIMIT = limit()

The LIMIT mapping is even more trivial

SQL

SELECT *
FROM (VALUES(1), (4), (3)) t(v)
LIMIT 2

yielding

+---+
| v |
+---+
| 1 |
| 4 |
+---+

Java

Stream<Integer> s = Stream.of(1, 4, 3);

s.limit(2)
 .forEach(System.out::println);

yielding

1
4

OFFSET = skip()

The OFFSET mapping is trivial as well

SQL

SELECT *
FROM (VALUES(1), (4), (3)) t(v)
OFFSET 1

yielding

+---+
| v |
+---+
| 4 |
| 3 |
+---+

Java

Stream<Integer> s = Stream.of(1, 4, 3);

s.skip(1)
 .forEach(System.out::println);

yielding

4
3

Conclusion

In the above article, we’ve seen pretty much all the useful SQL SELECT query clauses and how they can be mapped to the Java 8 Stream API, or to jOOλ’s Seq API, in case Stream doesn’t offer sufficient functionality.

The article shows that SQL’s declarative world is not that much different from Java 8’s functional world. SQL clauses can compose ad-hoc queries just as well as Stream methods can be used to compose functional transformation pipelines. But there is a fundamental difference.

While SQL is truly declarative, functional programming is still very instructive. The Stream API does not make optimisation decisions based on constraints, indexes, histograms and other meta information about the data that you’re transforming. Using the Stream API is like using all possible optimisation hints in SQL to force the SQL engine to choose one particular execution plan over another. However, while SQL is a higher level algorithm abstraction, the Stream API may allow you to implement more customisable algorithms.

When the Java 8 Streams API is not Enough

Java 8 was – as always – a release of compromises and backwards-compatibility. A release where the JSR-335 expert group might not have agreed upon scope or feasibility of certain features with some of the audience. See some concrete explanations by Brian Goetz about why …

But today we’re going to focus on the Streams API’s “short-comings”, or as Brian Goetz would probably put it: things out of scope given the design goals.

Parallel Streams?

Parallel computing is hard, and it used to be a pain. People didn’t exactly love the new (now old) Fork / Join API, when it was first shipped with Java 7. Conversely, and clearly, the conciseness of calling Stream.parallel() is unbeatable.

But many people don’t actually need parallel computing (not to be confused with multi-threading!). In 95% of all cases, people would have probably preferred a more powerful Streams API, or perhaps a generally more powerful Collections API with lots of awesome methods on various Iterable subtypes.

Changing Iterable is dangerous, though. Even a no-brainer as transforming an Iterable into a Stream via a potential Iterable.stream() method seems to risk opening pandora’s box!.

Sequential Streams!

So if the JDK doesn’t ship it, we create it ourselves!

Streams are quite awesome per se. They’re potentially infinite, and that’s a cool feature. Mostly – and especially with functional programming – the size of a collection doesn’t really matter that much, as we transform element by element using functions.

If we admit Streams to be purely sequential, then we could have any of these pretty cool methods as well (some of which would also be possible with parallel Streams):

  • cycle() – a guaranteed way to make every stream infinite
  • duplicate() – duplicate a stream into two equivalent streams
  • foldLeft() – a sequential and non-associative alternative to reduce()
  • foldRight() – a sequential and non-associative alternative to reduce()
  • limitUntil() – limit the stream to those records before the first one to satisfy a predicate
  • limitWhile() – limit the stream to those records before the first one not to satisfy a predicate
  • maxBy() – reduce the stream to the maximum mapped value
  • minBy() – reduce the stream to the minimum mapped value
  • partition() – partition a stream into two streams, one satisfying a predicate and the other not satisfying the same predicate
  • reverse() – produce a new stream in inverse order
  • skipUntil() – skip records until a predicate is satisified
  • skipWhile() – skip records as long as a predicate is satisfied
  • slice() – take a slice of the stream, i.e. combine skip() and limit()
  • splitAt() – split a stream into two streams at a given position
  • unzip() – split a stream of pairs into two streams
  • zip() – merge two streams into a single stream of pairs
  • zipWithIndex() – merge a stream with its corresponding stream of indexes into a single stream of pairs

jOOλ’s new Seq type does all that

All of the above is part of jOOλ. jOOλ (pronounced “jewel”, or “dju-lambda”, also written jOOL in URLs and such) is an ASL 2.0 licensed library that emerged from our own development needs when implementing jOOQ integration tests with Java 8. Java 8 is exceptionally well-suited for writing tests that reason about sets, tuples, records, and all things SQL.

But the Streams API just slightly feels insufficient, so we have wrapped JDK’s Streams into our own Seq type (Seq for sequence / sequential Stream):

// Wrap a stream in a sequence
Seq<Integer> seq1 = seq(Stream.of(1, 2, 3));

// Or create a sequence directly from values
Seq<Integer> seq2 = Seq.of(1, 2, 3);

We’ve made Seq a new interface that extends the JDK Stream interface, so you can use Seq fully interoperably with other Java APIs – leaving the existing methods unchanged:

public interface Seq<T> extends Stream<T> {

    /**
     * The underlying {@link Stream} implementation.
     */
    Stream<T> stream();
	
	// [...]
}

Now, functional programming is only half the fun if you don’t have tuples. Unfortunately, Java doesn’t have built-in tuples and while it is easy to create a tuple library using generics, tuples are still second-class syntactic citizens when comparing Java to Scala, for instance, or C# and even VB.NET.

Nonetheless…

jOOλ also has tuples

We’ve run a code-generator to produce tuples of degree 1-8 (we might add more in the future, e.g. to match Scala’s and jOOQ’s “magical” degree 22).

And if a library has such tuples, the library also needs corresponding functions. The essence of these TupleN and FunctionN types is summarised as follows:

public class Tuple3<T1, T2, T3>
implements 
    Tuple, 
	Comparable<Tuple3<T1, T2, T3>>, 
	Serializable, Cloneable {
    
    public final T1 v1;
    public final T2 v2;
    public final T3 v3;
	
	// [...]
}

and

@FunctionalInterface
public interface Function3<T1, T2, T3, R> {

    default R apply(Tuple3<T1, T2, T3> args) {
        return apply(args.v1, args.v2, args.v3);
    }

    R apply(T1 v1, T2 v2, T3 v3);
}

There are many more features in Tuple types, but let’s leave them out for today.

On a side note, I’ve recently had an interesting discussion with Gavin King (the creator of Hibernate) on reddit. From an ORM perspective, Java classes seem like a suitable implementation for SQL / relational tuples, and they are indeed. From an ORM perspective.

But classes and tuples are fundamentally different, which is a very subtle issue with most ORMs – e.g. as explained here by Vlad Mihalcea.

Besides, SQL’s notion of row value expressions (i.e. tuples) is quite different from what can be modelled with Java classes. This topic will be covered in a subsequent blog post.

Some jOOλ examples

With the aforementioned goals in mind, let’s see how the above API can be put to work by example:

zipping

// (tuple(1, "a"), tuple(2, "b"), tuple(3, "c"))
Seq.of(1, 2, 3).zip(Seq.of("a", "b", "c"));

// ("1:a", "2:b", "3:c")
Seq.of(1, 2, 3).zip(
    Seq.of("a", "b", "c"), 
    (x, y) -> x + ":" + y
);

// (tuple("a", 0), tuple("b", 1), tuple("c", 2))
Seq.of("a", "b", "c").zipWithIndex();

// tuple((1, 2, 3), (a, b, c))
Seq.unzip(Seq.of(
    tuple(1, "a"),
    tuple(2, "b"),
    tuple(3, "c")
));

This is already a case where tuples have become very handy. When we “zip” two streams into one, we want a wrapper value type that combines both values. Classically, people might’ve used Object[] for quick-and-dirty solutions, but an array doesn’t indicate attribute types or degree.

Unfortunately, the Java compiler cannot reason about the effective bound of the <T> type in Seq<T>. This is why we can only have a static unzip() method (instead of an instance one), whose signature looks like this:

// This works
static <T1, T2> Tuple2<Seq<T1>, Seq<T2>> 
    unzip(Stream<Tuple2<T1, T2>> stream) { ... }
	
// This doesn't work:
interface Seq<T> extends Stream<T> {
    Tuple2<Seq<???>, Seq<???>> unzip();
}

Skipping and limiting

// (3, 4, 5)
Seq.of(1, 2, 3, 4, 5).skipWhile(i -> i < 3);

// (3, 4, 5)
Seq.of(1, 2, 3, 4, 5).skipUntil(i -> i == 3);

// (1, 2)
Seq.of(1, 2, 3, 4, 5).limitWhile(i -> i < 3);

// (1, 2)
Seq.of(1, 2, 3, 4, 5).limitUntil(i -> i == 3);

Other functional libraries probably use different terms than skip (e.g. drop) and limit (e.g. take). It doesn’t really matter in the end. We opted for the terms that are already present in the existing Stream API: Stream.skip() and Stream.limit()

Folding

// "abc"
Seq.of("a", "b", "c").foldLeft("", (u, t) -> t + u);

// "cba"
Seq.of("a", "b", "c").foldRight("", (t, u) -> t + u);

The Stream.reduce() operations are designed for parallelisation. This means that the functions passed to it must have these important attributes:

But sometimes, you really want to “reduce” a stream with functions that do not have the above attributes, and consequently, you probably don’t care about the reduction being parallelisable. This is where “folding” comes in.

A nice explanation about the various differences between reducing and folding (in Scala) can be seen here.

Splitting

// tuple((1, 2, 3), (1, 2, 3))
Seq.of(1, 2, 3).duplicate();

// tuple((1, 3, 5), (2, 4, 6))
Seq.of(1, 2, 3, 4, 5, 6).partition(i -> i % 2 != 0)

// tuple((1, 2), (3, 4, 5))
Seq.of(1, 2, 3, 4, 5).splitAt(2);

The above functions all have one thing in common: They operate on a single stream in order to produce two new streams, that can be consumed independently.

Obviously, this means that internally, some memory must be consumed to keep buffers of partially consumed streams. E.g.

  • duplication needs to keep track of all values that have been consumed in one stream, but not in the other
  • partitioning needs to fast forward to the next value that satisfies (or doesn’t satisfy) the predicate, without losing all the dropped values
  • splitting might need to fast forward to the split index

For some real functional fun, let’s have a look at a possible splitAt() implementation:

static <T> Tuple2<Seq<T>, Seq<T>> 
splitAt(Stream<T> stream, long position) {
    return seq(stream)
          .zipWithIndex()
          .partition(t -> t.v2 < position)
          .map((v1, v2) -> tuple(
              v1.map(t -> t.v1),
              v2.map(t -> t.v1)
          ));
}

… or with comments:

static <T> Tuple2<Seq<T>, Seq<T>> 
splitAt(Stream<T> stream, long position) {
    // Add jOOλ functionality to the stream
    // -> local Type: Seq<T>
    return seq(stream)
	
    // Keep track of stream positions
    // with each element in the stream
    // -> local Type: Seq<Tuple2<T, Long>>
          .zipWithIndex()
	  
    // Split the streams at position
    // -> local Type: Tuple2<Seq<Tuple2<T, Long>>,
    //                       Seq<Tuple2<T, Long>>>
          .partition(t -> t.v2 < position)
		  
    // Remove the indexes from zipWithIndex again
    // -> local Type: Tuple2<Seq<T>, Seq<T>>
          .map((v1, v2) -> tuple(
              v1.map(t -> t.v1),
              v2.map(t -> t.v1)
          ));
}

Nice, isn’t it? A possible implementation for partition(), on the other hand, is a bit more complex. Here trivially with Iterator instead of the new Spliterator:

static <T> Tuple2<Seq<T>, Seq<T>> partition(
        Stream<T> stream, 
        Predicate<? super T> predicate
) {
    final Iterator<T> it = stream.iterator();
    final LinkedList<T> buffer1 = new LinkedList<>();
    final LinkedList<T> buffer2 = new LinkedList<>();

    class Partition implements Iterator<T> {

        final boolean b;

        Partition(boolean b) {
            this.b = b;
        }

        void fetch() {
            while (buffer(b).isEmpty() && it.hasNext()) {
                T next = it.next();
                buffer(predicate.test(next)).offer(next);
            }
        }

        LinkedList<T> buffer(boolean test) {
            return test ? buffer1 : buffer2;
        }

        @Override
        public boolean hasNext() {
            fetch();
            return !buffer(b).isEmpty();
        }

        @Override
        public T next() {
            return buffer(b).poll();
        }
    }

    return tuple(
        seq(new Partition(true)), 
        seq(new Partition(false))
    );
}

I’ll let you do the exercise and verify the above code.

Get and contribute to jOOλ, now!

All of the above is part of jOOλ, available for free from GitHub. There is already a partially Java-8-ready, full-blown library called functionaljava, which goes much further than jOOλ.

Yet, we believe that all what’s missing from Java 8’s Streams API is really just a couple of methods that are very useful for sequential streams.

In a previous post, we’ve shown how we can bring lambdas to String-based SQL using a simple wrapper for JDBC (of course, we still believe that you should use jOOQ instead).

Today, we’ve shown how we can write awesome functional and sequential Stream processing very easily, with jOOλ.

Stay tuned for even more jOOλ goodness in the near future (and pull requests are very welcome, of course!)

Java 8 Friday: More Functional Relational Transformation

In the past, we’ve been providing you with a new article every Friday about what’s new in Java 8. It has been a very exciting blog series, but we would like to focus again more on our core content, which is Java and SQL. We will still be occasionally blogging about Java 8, but no longer every Friday (as some of you have already notice).

In this last, short post of the Java 8 Friday series, we’d like to re-iterate the fact that we believe that the future belongs to functional relational data transformation (as opposed to ORM). We’ve spent about 20 years now using the object-oriented software development paradigm. Many of us have been very dogmatic about it. In the last 10 years, however, a “new” paradigm has started to get increasing traction in programming communities: Functional programming.

Functional programming is not that new, however. Lisp has been a very early functional programming language. XSLT and SQL are also somewhat functional (and declarative!). As we’re big fans of SQL’s functional (and declarative!) nature, we’re quite excited about the fact that we now have sophisticated tools in Java to transform tabular data that has been extracted from SQL databases. Streams!

SQL ResultSets are very similar to Streams

As we’ve pointed out before, JDBC ResultSets and Java 8 Streams are quite similar. This is even more true when you’re using jOOQ, which replaces the JDBC ResultSet by an org.jooq.Result, which extends java.util.List, and thus automatically inherits all Streams functionality. Consider the following query that allows fetching a one-to-many relationship between BOOK and AUTHOR records:

Map<Record2<String, String>, 
    List<Record2<Integer, String>>> booksByAuthor =

// This work is performed in the database
// --------------------------------------
ctx.select(
        BOOK.ID,
        BOOK.TITLE,
        AUTHOR.FIRST_NAME,
        AUTHOR.LAST_NAME
    )
   .from(BOOK)
   .join(AUTHOR)
   .on(BOOK.AUTHOR_ID.eq(AUTHOR.ID))
   .orderBy(BOOK.ID)
   .fetch()

// This work is performed in Java memory
// -------------------------------------
   .stream()

   // Group BOOKs by AUTHOR
   .collect(groupingBy(

        // This is the grouping key      
        r -> r.into(AUTHOR.FIRST_NAME, 
                    AUTHOR.LAST_NAME),

        // This is the target data structure
        LinkedHashMap::new,

        // This is the value to be produced for each
        // group: A list of BOOK
        mapping(
            r -> r.into(BOOK.ID, BOOK.TITLE),
            toList()
        )
    ));

The fluency of the Java 8 Streams API is very idiomatic to someone who has been used to writing SQL with jOOQ. Obviously, you can also use something other than jOOQ, e.g. Spring’s JdbcTemplate, or Apache Commons DbUtils, or just wrap the JDBC ResultSet in an Iterator…

What’s very nice about this approach, compared to ORM is the fact that there is no magic happening at all. Every piece of mapping logic is explicit and, thanks to Java generics, fully typesafe. The type of the booksByAuthor output is complex, and a bit hard to read / write, in this example, but it is also fully descriptive and useful.

The same functional transformation with POJOs

If you aren’t too happy with using jOOQ’s Record2 tuple types, no problem. You can specify your own data transfer objects like so:

class Book {
    public int id;
    public String title;

    @Override
    public String toString() { ... }

    @Override
    public int hashCode() { ... }

    @Override
    public boolean equals(Object obj) { ... }
}

static class Author {
    public String firstName;
    public String lastName;

    @Override
    public String toString() { ... }

    @Override
    public int hashCode() { ... }

    @Override
    public boolean equals(Object obj) { ... }
}

With the above DTO, you can now leverage jOOQ’s built-in POJO mapping to transform the jOOQ records into your own domain classes:

Map<Author, List<Book>> booksByAuthor =
ctx.select(
        BOOK.ID,
        BOOK.TITLE,
        AUTHOR.FIRST_NAME,
        AUTHOR.LAST_NAME
    )
   .from(BOOK)
   .join(AUTHOR)
   .on(BOOK.AUTHOR_ID.eq(AUTHOR.ID))
   .orderBy(BOOK.ID)
   .fetch()
   .stream()
   .collect(groupingBy(

        // This is the grouping key      
        r -> r.into(Author.class),
        LinkedHashMap::new,

        // This is the grouping value list
        mapping(
            r -> r.into(Book.class),
            toList()
        )
    ));

Explicitness vs. implicitness

At Data Geekery, we believe that a new time has started for Java developers. A time where Annotatiomania™ (finally!) ends and people stop assuming all that implicit behaviour through annotation magic. ORMs depend on a huge amount of specification to explain how each annotation works with each other annotation. It is hard to reverse-engineer (or debug!) this kind of not-so-well-understood annotation-language that JPA has brought to us.

On the flip side, SQL is pretty well understood. Tables are an easy-to-handle data structure, and if you need to transform those tables into something more object-oriented, or more hierarchically structured, you can simply apply functions to those tables and group values yourself! By grouping those values explicitly, you stay in full control of your mapping, just as with jOOQ, you stay in full control of your SQL.

This is why we believe that in the next 5 years, ORMs will lose relevance and people start embracing explicit, stateless and magicless data transformation techniques again, using Java 8 Streams.

Java 8 Friday: The Best Java 8 Resources – Your Weekend is Booked

At Data Geekery, we love Java. And as we’re really into jOOQ’s fluent API and query DSL, we’re absolutely thrilled about what Java 8 will bring to our ecosystem.

Every Friday, we’re showing you a couple of nice new tutorial-style Java 8 features, which take advantage of lambda expressions, method references, default methods, the Streams API, and other great stuff. You’ll find the source code on GitHub.

The Best Java 8 Resources – Your Weekend is Booked

We’re obviously not the only ones writing about Java 8. Ever since this great language update’s go live, there had been blogs all around the world appearing with great content and different perspectives on the subject. In this edition of the Java 8 Friday series, we’d like to summarise some of the best content that has been going on on that subject.

1. Brian Goetz’s Answers on Stack Overflow

Brian Goetz was the spec lead for JSR 335. Together with his Expert Group team, he has worked very hard to help Java 8 succeed. However, now that JSR 335 has shipped, his work is far from being over. Brian has had the courtesy of giving authoritative answers to questions from the Java community on Stack Overflow. Here are some of the most interesting questions:

Thumbs up to this great community effort. It cannot get any better than hearing authoritative answers from the spec lead himself.

2. Baeldung.com’s Collection of Java 8 Resources

This list of resources wouldn’t be complete without the very useful list of Java 8 resources (mostly authoritative links to specifications) from the guys over at Baeldung.com. Here is:

http://www.baeldung.com/java8

3. The jOOQ Blog’s Java 8 Friday Series

Yay, that’s us! :-)

Yes, we’ve worked hard to bring you the latest from our experience when integrating jOOQ with Java 8. Here are some of our most popular articles from the recent months:

4. ZeroTurnaround’s RebelLabs Blog

As part of the ZeroTurnaround content marketing strategy, ZeroTurnaround has launched RebelLabs quite a while ago where various writers publish interesting articles around the topic of Java, which aren’t necessarily related to JRebel and other ZT products. There is some great Java 8 related content having been published over there. Here are our favourite gems:

5. The Takipi Blog

Just like ZeroTurnaround and ourselves, our friends over at Takipi provide you with some awesome Java 8 content on their blog.

6. Benji Weber’s Fun Experiments with Java 8

This blog series we found particularly fun to read. Benji Weber really thinks outside of the box and does some crazy things with default methods, method references and all that. Things that Java developers could only dream of, so far. Here are:

7. The Geeks from Paradise Blog’s Java 8 Musings

Edwin Dalorzo from Informatech has been treating us with a variety of well-founded comparisons between Java 8 and .NET. This is particularly interesting when comparing Streams with LINQ. Here are some of his best writings:

Is this list complete?

No, it is missing many other, very interesting blog series. Do you have a series to share? We’re more than happy to update this post, just let us know (in the comments section)

Java 8 Friday: 10 Subtle Mistakes When Using the Streams API

At Data Geekery, we love Java. And as we’re really into jOOQ’s fluent API and query DSL, we’re absolutely thrilled about what Java 8 will bring to our ecosystem.

Java 8 Friday

Every Friday, we’re showing you a couple of nice new tutorial-style Java 8 features, which take advantage of lambda expressions, extension methods, and other great stuff. You’ll find the source code on GitHub.

10 Subtle Mistakes When Using the Streams API

We’ve done all the SQL mistakes lists:

But we haven’t done a top 10 mistakes list with Java 8 yet! For today’s occasion (it’s Friday the 13th), we’ll catch up with what will go wrong in YOUR application when you’re working with Java 8. (it won’t happen to us, as we’re stuck with Java 6 for another while)

1. Accidentally reusing streams

Wanna bet, this will happen to everyone at least once. Like the existing “streams” (e.g. InputStream), you can consume streams only once. The following code won’t work:

IntStream stream = IntStream.of(1, 2);
stream.forEach(System.out::println);

// That was fun! Let's do it again!
stream.forEach(System.out::println);

You’ll get a

java.lang.IllegalStateException: 
  stream has already been operated upon or closed

So be careful when consuming your stream. It can be done only once

2. Accidentally creating “infinite” streams

You can create infinite streams quite easily without noticing. Take the following example:

// Will run indefinitely
IntStream.iterate(0, i -> i + 1)
         .forEach(System.out::println);

The whole point of streams is the fact that they can be infinite, if you design them to be. The only problem is, that you might not have wanted that. So, be sure to always put proper limits:

// That's better
IntStream.iterate(0, i -> i + 1)
         .limit(10)
         .forEach(System.out::println);

3. Accidentally creating “subtle” infinite streams

We can’t say this enough. You WILL eventually create an infinite stream, accidentally. Take the following stream, for instance:

IntStream.iterate(0, i -> ( i + 1 ) % 2)
         .distinct()
         .limit(10)
         .forEach(System.out::println);

So…

  • we generate alternating 0’s and 1’s
  • then we keep only distinct values, i.e. a single 0 and a single 1
  • then we limit the stream to a size of 10
  • then we consume it

Well… the distinct() operation doesn’t know that the function supplied to the iterate() method will produce only two distinct values. It might expect more than that. So it’ll forever consume new values from the stream, and the limit(10) will never be reached. Tough luck, your application stalls.

4. Accidentally creating “subtle” parallel infinite streams

We really need to insist that you might accidentally try to consume an infinite stream. Let’s assume you believe that the distinct() operation should be performed in parallel. You might be writing this:

IntStream.iterate(0, i -> ( i + 1 ) % 2)
         .parallel()
         .distinct()
         .limit(10)
         .forEach(System.out::println);

Now, we’ve already seen that this will turn forever. But previously, at least, you only consumed one CPU on your machine. Now, you’ll probably consume four of them, potentially occupying pretty much all of your system with an accidental infinite stream consumption. That’s pretty bad. You can probably hard-reboot your server / development machine after that. Have a last look at what my laptop looked like prior to exploding:

If I were a laptop, this is how I'd like to go.
If I were a laptop, this is how I’d like to go.

5. Mixing up the order of operations

So, why did we insist on your definitely accidentally creating infinite streams? It’s simple. Because you may just accidentally do it. The above stream can be perfectly consumed if you switch the order of limit() and distinct():

IntStream.iterate(0, i -> ( i + 1 ) % 2)
         .limit(10)
         .distinct()
         .forEach(System.out::println);

This now yields:

0
1

Why? Because we first limit the infinite stream to 10 values (0 1 0 1 0 1 0 1 0 1), before we reduce the limited stream to the distinct values contained in it (0 1).

Of course, this may no longer be semantically correct, because you really wanted the first 10 distinct values from a set of data (you just happened to have “forgotten” that the data is infinite). No one really wants 10 random values, and only then reduce them to be distinct.

If you’re coming from a SQL background, you might not expect such differences. Take SQL Server 2012, for instance. The following two SQL statements are the same:

-- Using TOP
SELECT DISTINCT TOP 10 *
FROM i
ORDER BY ..

-- Using FETCH
SELECT *
FROM i
ORDER BY ..
OFFSET 0 ROWS
FETCH NEXT 10 ROWS ONLY

So, as a SQL person, you might not be as aware of the importance of the order of streams operations.

jOOQ, the best way to write SQL in Java

6. Mixing up the order of operations (again)

Speaking of SQL, if you’re a MySQL or PostgreSQL person, you might be used to the LIMIT .. OFFSET clause. SQL is full of subtle quirks, and this is one of them. The OFFSET clause is applied FIRST, as suggested in SQL Server 2012’s (i.e. the SQL:2008 standard’s) syntax.

If you translate MySQL / PostgreSQL’s dialect directly to streams, you’ll probably get it wrong:

IntStream.iterate(0, i -> i + 1)
         .limit(10) // LIMIT
         .skip(5)   // OFFSET
         .forEach(System.out::println);

The above yields

5
6
7
8
9

Yes. It doesn’t continue after 9, because the limit() is now applied first, producing (0 1 2 3 4 5 6 7 8 9). skip() is applied after, reducing the stream to (5 6 7 8 9). Not what you may have intended.

BEWARE of the LIMIT .. OFFSET vs. "OFFSET .. LIMIT" trap!

7. Walking the file system with filters

We’ve blogged about this before. What appears to be a good idea is to walk the file system using filters:

Files.walk(Paths.get("."))
     .filter(p -> !p.toFile().getName().startsWith("."))
     .forEach(System.out::println);

The above stream appears to be walking only through non-hidden directories, i.e. directories that do not start with a dot. Unfortunately, you’ve again made mistake #5 and #6. walk() has already produced the whole stream of subdirectories of the current directory. Lazily, though, but logically containing all sub-paths. Now, the filter will correctly filter out paths whose names start with a dot “.”. E.g. .git or .idea will not be part of the resulting stream. But these paths will be: .\.git\refs, or .\.idea\libraries. Not what you intended.

Now, don’t fix this by writing the following:

Files.walk(Paths.get("."))
     .filter(p -> !p.toString().contains(File.separator + "."))
     .forEach(System.out::println);

While that will produce the correct output, it will still do so by traversing the complete directory subtree, recursing into all subdirectories of “hidden” directories.

I guess you’ll have to resort to good old JDK 1.0 File.list() again. The good news is, FilenameFilter and FileFilter are both functional interfaces.

8. Modifying the backing collection of a stream

While you’re iterating a List, you must not modify that same list in the iteration body. That was true before Java 8, but it might become more tricky with Java 8 streams. Consider the following list from 0..9:

// Of course, we create this list using streams:
List<Integer> list = 
IntStream.range(0, 10)
         .boxed()
         .collect(toCollection(ArrayList::new));

Now, let’s assume that we want to remove each element while consuming it:

list.stream()
    // remove(Object), not remove(int)!
    .peek(list::remove)
    .forEach(System.out::println);

Interestingly enough, this will work for some of the elements! The output you might get is this one:

0
2
4
6
8
null
null
null
null
null
java.util.ConcurrentModificationException

If we introspect the list after catching that exception, there’s a funny finding. We’ll get:

[1, 3, 5, 7, 9]

Heh, it “worked” for all the odd numbers. Is this a bug? No, it looks like a feature. If you’re delving into the JDK code, you’ll find this comment in ArrayList.ArraListSpliterator:

/*
 * If ArrayLists were immutable, or structurally immutable (no
 * adds, removes, etc), we could implement their spliterators
 * with Arrays.spliterator. Instead we detect as much
 * interference during traversal as practical without
 * sacrificing much performance. We rely primarily on
 * modCounts. These are not guaranteed to detect concurrency
 * violations, and are sometimes overly conservative about
 * within-thread interference, but detect enough problems to
 * be worthwhile in practice. To carry this out, we (1) lazily
 * initialize fence and expectedModCount until the latest
 * point that we need to commit to the state we are checking
 * against; thus improving precision.  (This doesn't apply to
 * SubLists, that create spliterators with current non-lazy
 * values).  (2) We perform only a single
 * ConcurrentModificationException check at the end of forEach
 * (the most performance-sensitive method). When using forEach
 * (as opposed to iterators), we can normally only detect
 * interference after actions, not before. Further
 * CME-triggering checks apply to all other possible
 * violations of assumptions for example null or too-small
 * elementData array given its size(), that could only have
 * occurred due to interference.  This allows the inner loop
 * of forEach to run without any further checks, and
 * simplifies lambda-resolution. While this does entail a
 * number of checks, note that in the common case of
 * list.stream().forEach(a), no checks or other computation
 * occur anywhere other than inside forEach itself.  The other
 * less-often-used methods cannot take advantage of most of
 * these streamlinings.
 */

Now, check out what happens when we tell the stream to produce sorted() results:

list.stream()
    .sorted()
    .peek(list::remove)
    .forEach(System.out::println);

This will now produce the following, “expected” output

0
1
2
3
4
5
6
7
8
9

And the list after stream consumption? It is empty:

[]

So, all elements are consumed, and removed correctly. The sorted() operation is a “stateful intermediate operation”, which means that subsequent operations no longer operate on the backing collection, but on an internal state. It is now “safe” to remove elements from the list!

Well… can we really? Let’s proceed with parallel(), sorted() removal:

list.stream()
    .sorted()
    .parallel()
    .peek(list::remove)
    .forEach(System.out::println);

This now yields:

7
6
2
5
8
4
1
0
9
3

And the list contains

[8]

Eek. We didn’t remove all elements!? Free beers (and jOOQ stickers) go to anyone who solves this streams puzzler!

This all appears quite random and subtle, we can only suggest that you never actually do modify a backing collection while consuming a stream. It just doesn’t work.

9. Forgetting to actually consume the stream

What do you think the following stream does?

IntStream.range(1, 5)
         .peek(System.out::println)
         .peek(i -> { 
              if (i == 5) 
                  throw new RuntimeException("bang");
          });

When you read this, you might think that it will print (1 2 3 4 5) and then throw an exception. But that’s not correct. It won’t do anything. The stream just sits there, never having been consumed.

As with any fluent API or DSL, you might actually forget to call the “terminal” operation. This might be particularly true when you use peek(), as peek() is an aweful lot similar to forEach().

This can happen with jOOQ just the same, when you forget to call execute() or fetch():

DSL.using(configuration)
   .update(TABLE)
   .set(TABLE.COL1, 1)
   .set(TABLE.COL2, "abc")
   .where(TABLE.ID.eq(3));

Oops. No execute()

jOOQ, the best way to write SQL in Java

Yes, the “best” way – with 1-2 caveats ;-)

10. Parallel stream deadlock

This is now a real goodie for the end!

All concurrent systems can run into deadlocks, if you don’t properly synchronise things. While finding a real-world example isn’t obvious, finding a forced example is. The following parallel() stream is guaranteed to run into a deadlock:

Object[] locks = { new Object(), new Object() };

IntStream
    .range(1, 5)
    .parallel()
    .peek(Unchecked.intConsumer(i -> {
        synchronized (locks[i % locks.length]) {
            Thread.sleep(100);

            synchronized (locks[(i + 1) % locks.length]) {
                Thread.sleep(50);
            }
        }
    }))
    .forEach(System.out::println);

Note the use of Unchecked.intConsumer(), which transforms the functional IntConsumer interface into a org.jooq.lambda.fi.util.function.CheckedIntConsumer, which is allowed to throw checked exceptions.

Well. Tough luck for your machine. Those threads will be blocked forever :-)

The good news is, it has never been easier to produce a schoolbook example of a deadlock in Java!

For more details, see also Brian Goetz’s answer to this question on Stack Overflow.

Conclusion

With streams and functional thinking, we’ll run into a massive amount of new, subtle bugs. Few of these bugs can be prevented, except through practice and staying focused. You have to think about how to order your operations. You have to think about whether your streams may be infinite.

Streams (and lambdas) are a very powerful tool. But a tool which we need to get a hang of, first.

Stay tuned for more exciting Java 8 articles on this blog.

Three-State Booleans in Java

Every now and then, I miss SQL’s three-valued BOOLEAN semantics in Java. In SQL, we have:

  • TRUE
  • FALSE
  • UNKNOWN (also known as NULL)

Every now and then, I find myself in a situation where I wish I could also express this UNKNOWN or UNINITIALISED semantics in Java, when plain true and false aren’t enough.

Implementing a ResultSetIterator

For instance, when implementing a ResultSetIterator for jOOλ, a simple library modelling SQL streams for Java 8:

SQL.stream(stmt, Unchecked.function(r ->
    new SQLGoodies.Schema(
        r.getString("FIELD_1"),
        r.getBoolean("FIELD_2")
    )
))
.forEach(System.out::println);

In order to implement a Java 8 Stream, we need to construct an Iterator, which we can then pass to the new Spliterators.spliteratorUnknownSize() method:

StreamSupport.stream(
  Spliterators.spliteratorUnknownSize(iterator, 0), 
  false
);

Another example for this can be seen here on Stack Overflow.

When implementing the Iterator interface, we must implement hasNext() and next(). Note that with Java 8, remove() now has a default implementation, so we don’t need to implement it any longer.

While most of the time, a call to next() is preceded by a call to hasNext() exactly once, nothing in the Iterator contract requires this. It is perfectly fine to write:

if (it.hasNext()) {
    // Some stuff

    // Double-check again to be sure
    if (it.hasNext() && it.hasNext()) {

        // Yes, we're paranoid
        if (it.hasNext())
            it.next();
    }
}

How to translate the Iterator calls to backing calls on the JDBC ResultSet? We need to call ResultSet.next().

We could make the following translation:

  • Iterator.hasNext() == !ResultSet.isLast()
  • Iterator.next() == ResultSet.next()

But that translation is:

  • Expensive
  • Not dealing correctly with empty ResultSets
  • Not implemented in all JDBC drivers (Support for the isLast method is optional for ResultSets with a result set type of TYPE_FORWARD_ONLY)

So, we’ll have to maintain a flag, internally, that tells us:

  • If we had already called ResultSet.next()
  • What the result of that call was

Instead of creating a second variable, why not just use a three-valued java.lang.Boolean. Here’s a possible implementation from jOOλ:

class ResultSetIterator<T> implements Iterator<T> {

    final Supplier<? extends ResultSet>  supplier;
    final Function<ResultSet, T>         rowFunction;
    final Consumer<? super SQLException> translator;

    /**
     * Whether the underlying {@link ResultSet} has
     * a next row. This boolean has three states:
     * <ul>
     * <li>null:  it's not known whether there 
     *            is a next row</li>
     * <li>true:  there is a next row, and it
     *            has been pre-fetched</li>
     * <li>false: there aren't any next rows</li>
     * </ul>
     */
    Boolean hasNext;
    ResultSet rs;

    ResultSetIterator(
        Supplier<? extends ResultSet> supplier, 
        Function<ResultSet, T> rowFunction, 
        Consumer<? super SQLException> translator
    ) {
        this.supplier = supplier;
        this.rowFunction = rowFunction;
        this.translator = translator;
    }

    private ResultSet rs() {
        return (rs == null) 
             ? (rs = supplier.get()) 
             :  rs;
    }

    @Override
    public boolean hasNext() {
        try {
            if (hasNext == null) {
                hasNext = rs().next();
            }

            return hasNext;
        }
        catch (SQLException e) {
            translator.accept(e);
            throw new IllegalStateException(e);
        }
    }

    @Override
    public T next() {
        try {
            if (hasNext == null) {
                rs().next();
            }

            return rowFunction.apply(rs());
        }
        catch (SQLException e) {
            translator.accept(e);
            throw new IllegalStateException(e);
        }
        finally {
            hasNext = null;
        }
    }
}

As you can see, the hasNext() method locally caches the hasNext three-valued boolean state only if it was null before. This means that calling hasNext() several times will have no effect until you call next(), which resets the hasNext cached state.

Both hasNext() and next() advance the ResultSet cursor if needed.

Readability?

Some of you may argue that this doesn’t help readability. They’d introduce a new variable like:

boolean hasNext;
boolean hasHasNextBeenCalled;

The trouble with this is the fact that you’re still implementing three-valued boolean state, but distributed to two variables, which are very hard to name in a way that is truly more readable than the actual java.lang.Boolean solution. Besides, there are actually four state values for two boolean variables, so there is a slight increase in the risk of bugs.

Every rule has its exception. Using null for the above semantics is a very good exception to the null-is-bad histeria that has been going on ever since the introduction of Option / Optional

In other words: Which approach is best? There’s no TRUE or FALSE answer, only UNKNOWN ;-)

Be careful with this

However, as we’ve discussed in a previous blog post, you should avoid returning null from API methods if possible. In this case, using null explicitly as a means to model state is fine because this model is encapsulated in our ResultSetIterator. But try to avoid leaking such state to the outside of your API.

Java 8 Friday: Optional Will Remain an Option in Java

At Data Geekery, we love Java. And as we’re really into jOOQ’s fluent API and query DSL, we’re absolutely thrilled about what Java 8 will bring to our ecosystem.

Java 8 Friday

Every Friday, we’re showing you a couple of nice new tutorial-style Java 8 features, which take advantage of lambda expressions, extension methods, and other great stuff. You’ll find the source code on GitHub.

Optional: A new Option in Java

So far, we’ve been pretty thrilled with all the additions to Java 8. All in all, this is a revolution more than anything before. But there are also one or two sore spots. One of them is how Java will never really get rid of

Null: The billion dollar mistake tweet this

In a previous blog post, we have explained the merits of NULL handling in the Ceylon language, which has found one of the best solutions to tackle this issue – at least on the JVM which is doomed to support the null pointer forever. In Ceylon, nullability is a flag that can be added to every type by appending a question mark to the type name. An example:

void hello() {
    String? name = process.arguments.first;
    String greeting;
    if (exists name) {
        greeting = "Hello, ``name``!";
    }
    else {
        greeting = "Hello, World!";
    }
    print(greeting);
}

That’s pretty slick. Combined with flow-sensitive typing, you will never run into the dreaded NullPointerException again:

Recently in the Operating Room. By Geek and Poke
Recently in the Operating Room. By Geek and Poke

Other languages have introduced the Option type. Most prominently: Scala. Java 8 now also introduced the Optional type (as well as the OptionalInt, OptionalLong, OptionalDouble types – more about those later on)

How does Optional work?

The main point behind Optional is to wrap an Object and to provide convenience API to handle nullability in a fluent manner. This goes well with Java 8 lambda expressions, which allow for lazy execution of operations. An example:

Optional<String> stringOrNot = Optional.of("123");

// This String reference will never be null
String alwaysAString =
    stringOrNot.orElse("");

// This Integer reference will be wrapped again
Optional<Integer> integerOrNot = 
    stringOrNot.map(Integer::parseInt);

// This int reference will never be null
int alwaysAnInt = stringOrNot
        .map(s -> Integer.parseInt(s))
        .orElse(0);

There are certain merits to the above in fluent APIs, specifically in the new Java 8 Streams API, which makes extensive use of Optional. For example:

Arrays.asList(1, 2, 3)
      .stream()
      .findAny()
      .ifPresent(System.out::println);

The above piece of code will print any number from the Stream onto the console, but only if such a number exists.

Old API is not retrofitted

For obvious backwards-compatibility reasons, the “old API” is not retrofitted. In other words, unlike Scala, Java 8 doesn’t use Optional all over the JDK. In fact, the only place where Optional is used is in the Streams API. As you can see in the Javadoc, usage is very scarce:

http://docs.oracle.com/javase/8/docs/api/java/util/class-use/Optional.html

This makes Optional a bit difficult to use. We’ve already blogged about this topic before. Concretely, the absence of an Optional type in the API is no guarantee of non-nullability. This is particularly nasty if you convert Streams into collections and collections into streams.

The Java 8 Optional type is treacherous tweet this

Parametric polymorphism

The worst implication of Optional on its “infected” API is parametric polymorphism, or simply: generics. When you reason about types, you will quickly understand that:

// This is a reference to a simple type:
Number s;

// This is a reference to a collection of
// the above simple type:
Collection<Number> c;

Generics are often used for what is generally accepted as composition. We have a Collection of String. With Optional, this compositional semantics is slightly abused (both in Scala and Java) to “wrap” a potentially nullable value. We now have:

// This is a reference to a nullable simple type:
Optional<Number> s;

// This is a reference to a collection of 
// possibly nullable simple types
Collection<Optional<Number>> c;

So far so good. We can substitute types to get the following:

// This is a reference to a simple type:
T s;

// This is a reference to a collection of
// the above simple type:
Collection<T> c;

But now enter wildcards and use-site variance. We can write

// No variance can be applied to simple types:
T s;

// Variance can be applied to collections of
// simple types:
Collection<? extends T> source;
Collection<? super T> target;

What do the above types mean in the context of Optional? Intuitively, we would like this to be about things like Optional<? extends Number> or Optional<? super Number>. In the above example we can write:

// Read a T-value from the source
T s = source.iterator().next();

// ... and put it into the target
target.add(s);

But this doesn’t work any longer with Optional

Collection<Optional<? extends T>> source;
Collection<Optional<? super T>> target;

// Read a value from the source
Optional<? extends T> s = source.iterator().next();

// ... cannot put it into the target
target.add(s); // Nope

… and there is no other way to reason about use-site variance when we have Optional and subtly more complex API.

If you add generic type erasure to the discussion, things get even worse. We no longer erase the component type of the above Collection, we also erase the type of virtually any reference. From a runtime / reflection perspective, this is almost like using Object all over the place!

Generic type systems are incredibly complex even for simple use-cases. Optional makes things only worse. It is quite hard to blend Optional with traditional collections API or other APIs. Compared to the ease of use of Ceylon’s flow-sensitive typing, or even Groovy’s elvis operator, Optional is like a sledge-hammer in your face.

Be careful when you apply it to your API!

Primitive types

One of the main reasons why Optional is still a very useful addition is the fact that the “object-stream” and the “primitive streams” have a “unified API” by the fact that we also have OptionalInt, OptionalLong, OptionalDouble types.

In other words, if you’re operating on primitive types, you can just switch the stream construction and reuse the rest of your stream API usage source code, in almost the same way. Compare these two chains:

// Stream and Optional
Optional<Integer> anyInteger = 
Arrays.asList(1, 2, 3)
      .stream()
      .filter(i -> i % 2 == 0)
      .findAny();
anyInteger.ifPresent(System.out::println);

// IntStream and OptionalInt
OptionalInt anyInt =
Arrays.stream(new int[] {1, 2, 3})
      .filter(i -> i % 2 == 0)
      .findAny();
anyInt.ifPresent(System.out::println);

In other words, given the scarce usage of these new types in JDK API, the dubious usefulness of such a type in general (if retrofitted into a very backwards-compatible environment) and the implications generics erasure have on Optional we dare say that

The only reason why this type was really added is to provide a more unified Streams API for both reference and primitive types tweet this

That’s tough. And makes us wonder, if we should finally get rid of primitive types altogether.

Oh, and…

Optional isn’t Serializable.

Nope. Not Serializable. Unlike ArrayList, for instance. For the usual reason:

Making something in the JDK serializable makes a dramatic increase in our maintenance costs, because it means that the representation is frozen for all time. This constrains our ability to evolve implementations in the future, and the number of cases where we are unable to easily fix a bug or provide an enhancement, which would otherwise be simple, is enormous. So, while it may look like a simple matter of “implements Serializable” to you, it is more than that. The amount of effort consumed by working around an earlier choice to make something serializable is staggering.

Citing Brian Goetz, from:

http://mail.openjdk.java.net/pipermail/jdk8-dev/2013-September/003276.html

Want to discuss Optional? Read these threads on reddit:

Stay tuned for more exciting Java 8 stuff published in this blog series.

More on Java 8

In the mean time, have a look at Eugen Paraschiv’s awesome Java 8 resources page