I couldn’t resist.
I have read this question by Hugo Prudente on Stack Overflow. And I knew there had to be a better way than what the JDK has to offer.
The question reads:
I’m looking for a lambda to refine the data already retrieved. I have a raw resultset, if the user do not change the date I want use java’s lambda to group by the results for then. And I’m new to lambdas with java.
The lambda I’m looking for works simliar to this query.
SELECT
z, w,
MIN(x), MAX(x), AVG(x),
MIN(y), MAX(y), AVG(y)
FROM table
GROUP BY z, w;
SQL is declarative. Functional programming is not.
Before we go on with this discussion, let’s establish a very important fact. SQL is a completely declarative language. Functional (or “functional-ish”, to keep the Haskell-aficionados at peace) programming languages like Java 8 are not declarative. While expressing data transformation algorithms using functions is much more concise than expressing them using objects, or worse, using imperative instructions, you’re still explicitly expressing the algorithm.
When you write SQL, you don’t write any algorithm. You just describe the result you want to have. The SQL engine’s optimiser will figure out the algorithm for you – e.g. based on the fact that you may have an index on
Z
but not on
W
or on
(Z, W)
.
While simple examples like these can easily be implemented using Java 8, you will quickly run into Java’s limitations, once you need to do more complex reporting.
Of course, as we’ve blogged before, the optimum is reached when you combine SQL and functional programming.
How can this be written in Java 8?
There are a variety of ways to do it. The essence is to understand all the participants in such a transformation. And no matter if you find this easy or hard, suitable for Java 8 or inadequate, thinking about the different, lesser-known parts of new Stream API is certainly worth the exercise.
The main participants here are:
- Stream: If you’re using JDK 8 libraries, then the new
java.util.stream.Stream
type will be your first choice.
- Collector: The JDK provides us with a rather low-level and thus very powerful new API for data aggregation (also known as “reduction”). This API is summarised by the new
java.util.stream.Collector
type, a new type from which we have heard only little so far in the blogosphere
Disclaimer
Some of the code displayed here might not work in your favourite IDE. Unfortunately, even if Java 7 reaches its end of life, all major IDEs (Eclipse, IntelliJ, NetBeans), and even the javac compiler still have quite a few bugs related to the
combination of generic type inference and lambda expressions. Stay tuned until those bugs are fixed! And report any bug you discover. We’ll all thank you for it!
Let’s go!
Let’s review our SQL statement:
SELECT
z, w,
MIN(x), MAX(x), AVG(x),
MIN(y), MAX(y), AVG(y)
FROM table
GROUP BY z, w;
In terms of the
Stream
API, the table itself is the
Stream
. Let’s just assume that we have a “table type”
A
as such:
class A {
final int w;
final int x;
final int y;
final int z;
A(int w, int x, int y, int z) {
this.w = w;
this.x = x;
this.y = y;
this.z = z;
}
@Override
public String toString() {
return "A{" +
"w=" + w +
", x=" + x +
", y=" + y +
", z=" + z +
'}';
}
}
You can also add
equals()
and
hashCode()
if you must.
We can now easily compose the
Stream
using
Stream.of()
, and some sample data:
Stream<A> stream =
Stream.of(
new A(1, 1, 1, 1),
new A(1, 2, 3, 1),
new A(9, 8, 6, 4),
new A(9, 9, 7, 4),
new A(2, 3, 4, 5),
new A(2, 4, 4, 5),
new A(2, 5, 5, 5));
Now, the next step is to
GROUP BY z, w
. The
Stream
API itself, unfortunately, doesn’t contain such a convenience method. We have to resort to more low-level operations by specifying the more general
Stream.collect()
operation, and passing a
Collector
to it that does the grouping. Luckily, a variety of different grouping
Collectors
are already made available from the
Collectors
helper class.
So we add that to our
stream
Stream.of(
new A(1, 1, 1, 1),
new A(1, 2, 3, 1),
new A(9, 8, 6, 4),
new A(9, 9, 7, 4),
new A(2, 3, 4, 5),
new A(2, 4, 4, 5),
new A(2, 5, 5, 5))
.collect(Collectors.groupingBy(...));

Now the interesting part starts. How do we specify that we want to group by both
A.z
and
A.w
? We need to provide this
groupingBy
method with a function that can extract something like a SQL
tuple from the
A
type. We could write our own tuple, or we simply use that of
jOOλ, a
library that we have created and open-sourced to improve our jOOQ integration tests.
The
Tuple2
type roughly looks like this:
public class Tuple2<T1, T2> {
public final T1 v1;
public final T2 v2;
public T1 v1() {
return v1;
}
public T2 v2() {
return v2;
}
public Tuple2(T1 v1, T2 v2) {
this.v1 = v1;
this.v2 = v2;
}
}
public interface Tuple {
static <T1, T2> Tuple2<T1, T2> tuple(T1 v1, T2 v2) {
return new Tuple2<>(v1, v2);
}
}
It has many more useful features, but these ones will be sufficient for this article.
On a side-note
Why the JDK doesn’t ship with built-in tuples like
C#’s or
Scala’s escapes me.
Functional programming without tuples is like coffee without sugar: A bitter punch in your face.
Anyway… back on track
So we’re grouping by the
(A.z, A.w)
tuple, as we would in SQL
Map<Tuple2<Integer, Integer>, List<A>> map =
Stream.of(
new A(1, 1, 1, 1),
new A(1, 2, 3, 1),
new A(9, 8, 6, 4),
new A(9, 9, 7, 4),
new A(2, 3, 4, 5),
new A(2, 4, 4, 5),
new A(2, 5, 5, 5))
.collect(Collectors.groupingBy(
a -> tuple(a.z, a.w)
));
As you can see, this produces a verbose but very descriptive type, a map containing our grouping tuple as its key, and a list of collected table records as its value.
Running the following statement
map.entrySet().forEach(System.out::println);
will yield:
(1, 1)=[A{w=1, x=1, y=1, z=1}, A{w=1, x=2, y=3, z=1}]
(4, 9)=[A{w=9, x=8, y=6, z=4}, A{w=9, x=9, y=7, z=4}]
(5, 2)=[A{w=2, x=3, y=4, z=5}, A{w=2, x=4, y=4, z=5}, A{w=2, x=5, y=5, z=5}]
That’s already quite awesome! In fact, this behaves like the SQL:2011 standard
COLLECT()
aggregate function, that is also available in Oracle 10g+
Now, instead of actually collecting the
A
records, we prefer to aggregate the individual values of
x
and
y
. The JDK provides us with a couple of interesting new types, e.g. the
java.util.IntSummaryStatistics
, which is available for convenience again from the
Collectors
type via
Collectors.summarizingInt()
.
On a side note
For my taste, this sledge-hammer data aggregation technique is a bit quirky. The JDK libraries have been left intentionally low level and verbose, perhaps to keep the library footprint small, or to prevent “horrible” consequences when in 5-10 years (after the release of JDK 9 and 10),
it becomes obvious that some features may have been added prematurely.
At the same time, there is this all-or-nothing
IntSummaryStatistics
, that blindly aggregates these popular aggregation values for your collection:
COUNT(*)
SUM()
MIN()
MAX()
and obviously, once you have
SUM()
and
COUNT(*)
, you also have
AVG() = SUM() / COUNT(*)
. So that’s going to be the Java way.
IntSummaryStatistics
.
In case you were wondering, the SQL:2011 standard specifies these aggregate functions:
AVG, MAX, MIN, SUM, EVERY, ANY, SOME, COUNT, STDDEV_POP, STDDEV_SAMP, VAR_SAMP, VAR_POP, COLLECT, FUSION, INTERSECTION, COVAR_POP, COVAR_SAMP, CORR, REGR_SLOPE, REGR_INTERCEPT, REGR_COUNT, REGR_R2, REGR_AVGX, REGR_AVGY, REGR_SXX, REGR_SYY, REGR_SXY, PERCENTILE_CONT, PERCENTILE_DISC, ARRAY_AGG
And obviously there are many other, vendor-specific
aggregate and window functions in SQL. We’ve blogged about them all:
True,
MIN, MAX, SUM, COUNT, AVG
are certainly the most popular ones. But it would’ve been nicer if they hadn’t been included in these default aggregation types, but made available in a much more composable way.
Anyway… back on track
If you want to stay low-level and use mostly JDK API, you can use the following technique to implement aggregation over two columns:
Map<
Tuple2<Integer, Integer>,
Tuple2<IntSummaryStatistics, IntSummaryStatistics>
> map = Stream.of(
new A(1, 1, 1, 1),
new A(1, 2, 3, 1),
new A(9, 8, 6, 4),
new A(9, 9, 7, 4),
new A(2, 3, 4, 5),
new A(2, 4, 4, 5),
new A(2, 5, 5, 5))
.collect(Collectors.groupingBy(
a -> tuple(a.z, a.w),
Collector.of(
// When collecting, we'll aggregate data
// into two IntSummaryStatistics for x and y
() -> tuple(new IntSummaryStatistics(),
new IntSummaryStatistics()),
// The accumulator will simply take
// new t = (x, y) values
(r, t) -> {
r.v1.accept(t.x);
r.v2.accept(t.y);
},
// The combiner will merge two partial
// aggregations, in case this is executed
// in parallel
(r1, r2) -> {
r1.v1.combine(r2.v1);
r1.v2.combine(r2.v2);
return r1;
}
)
));
map.entrySet().forEach(System.out::println);
The above would now print
(1, 1)=(IntSummaryStatistics{count=2, sum=3, min=1, average=1.500000, max=2},
IntSummaryStatistics{count=2, sum=4, min=1, average=2.000000, max=3})
(4, 9)=(IntSummaryStatistics{count=2, sum=17, min=8, average=8.500000, max=9},
IntSummaryStatistics{count=2, sum=13, min=6, average=6.500000, max=7})
(5, 2)=(IntSummaryStatistics{count=3, sum=12, min=3, average=4.000000, max=5},
IntSummaryStatistics{count=3, sum=13, min=4, average=4.333333, max=5})
But obviously, no one will want to write that much code.
The same thing can be achieved with jOOλ with much less code
Map<
Tuple2<Integer, Integer>,
Tuple2<IntSummaryStatistics, IntSummaryStatistics>
> map =
// Seq is like a Stream, but sequential only,
// and with more features
Seq.of(
new A(1, 1, 1, 1),
new A(1, 2, 3, 1),
new A(9, 8, 6, 4),
new A(9, 9, 7, 4),
new A(2, 3, 4, 5),
new A(2, 4, 4, 5),
new A(2, 5, 5, 5))
// Seq.groupBy() is just short for
// Stream.collect(Collectors.groupingBy(...))
.groupBy(
a -> tuple(a.z, a.w),
// ... because once you have tuples,
// why not add tuple-collectors?
Tuple.collectors(
Collectors.summarizingInt(a -> a.x),
Collectors.summarizingInt(a -> a.y)
)
);
What you see above is probably as close as it gets to the original, very simmple SQL statement:
SELECT
z, w,
MIN(x), MAX(x), AVG(x),
MIN(y), MAX(y), AVG(y)
FROM table
GROUP BY z, w;
The interesting part here is the fact that we have what we call “tuple-collectors”, a
Collector
that collects data into tuples of aggregated results for any degree of the tuple (up to 8). Here’s the code for
Tuple.collectors
:
// All of these generics... sheesh!
static <T, A1, A2, D1, D2>
Collector<T, Tuple2<A1, A2>, Tuple2<D1, D2>>
collectors(
Collector<T, A1, D1> collector1
, Collector<T, A2, D2> collector2
) {
return Collector.of(
() -> tuple(
collector1.supplier().get()
, collector2.supplier().get()
),
(a, t) -> {
collector1.accumulator().accept(a.v1, t);
collector2.accumulator().accept(a.v2, t);
},
(a1, a2) -> tuple(
collector1.combiner().apply(a1.v1, a2.v1)
, collector2.combiner().apply(a1.v2, a2.v2)
),
a -> tuple(
collector1.finisher().apply(a.v1)
, collector2.finisher().apply(a.v2)
)
);
}
Where the
Tuple2<D1, D2>
is the aggregation result type that we derive from
collector1
(which provides
D1
) and from
collector2
(which provides
D2
).
That’s it. We’re done!
Conclusion
Java 8 is a first step towards functional programming in Java. Using Streams and lambda expressions, we can already achieve quite a bit. The JDK APIs, however, are extremely low level and the experience when using IDEs like Eclipse, IntelliJ, or NetBeans can still be a bit frustrating. While writing this article (and adding the
Tuple.collectors()
method), I have reported around 10 bugs to the different IDEs. Some javac compiler bugs are not yet fixed, prior to JDK 1.8.0_40 ea. In other words:
I just keep throwing generic type parameters at the darn thing until the compiler stops bitching at me
But we’re on a good path. I trust that more useful API will ship with JDK 9 and especially with JDK 10, when all of the above will hopefully profit from the new
value types and
generic type specialization.

And, of course, if you haven’t already,
download and contribute to jOOλ here!
We have created jOOλ to add the missing pieces to the JDK libraries. If you want to go all in on functional programming, i.e. when your vocabulary includes hipster terms (couldn’t resist) like monads, monoids, functors, and all that, we suggest you skip the JDK’s Streams and jOOλ entirely, and go download
functionaljava by
Mark Perry or
vavr by
Daniel DietrichLike this:
Like Loading...
Java 8 streams are really powerful and versatile but I still think that doing grouping in SQL is much easier and efficient then in Java, because of memory overhead and network latency overhead it puts to your application.
Absolutely. And this is really a trivial SQL GROUP BY. The ordinary cats-and-dogs example. Actual SQL looks quite different. But still, it’s good to know that some data transformation can easily be made in Java with relatively little effort.