2016 Will be the Year Remembered as When Java Finally Had Window Functions!

You heard right. Up until now, the awesome window functions were a feature uniquely reserved to SQL. Even sophisticated functional programming languages still seem to lack this beautiful functionality (correct me if I’m wrong, Haskell folks). We’ve written tons of blog posts about window functions, evangelising them to our audience, in articles like:

One of my favourite example use-cases for window functions is the running total. I.e. to get from the following bank account transaction table:

| ID   | VALUE_DATE | AMOUNT |
|------|------------|--------|
| 9997 | 2014-03-18 |  99.17 |
| 9981 | 2014-03-16 |  71.44 |
| 9979 | 2014-03-16 | -94.60 |
| 9977 | 2014-03-16 |  -6.96 |
| 9971 | 2014-03-15 | -65.95 |

… to this one, with a calculated balance:

| ID   | VALUE_DATE | AMOUNT |  BALANCE |
|------|------------|--------|----------|
| 9997 | 2014-03-18 |  99.17 | 19985.81 |
| 9981 | 2014-03-16 |  71.44 | 19886.64 |
| 9979 | 2014-03-16 | -94.60 | 19815.20 |
| 9977 | 2014-03-16 |  -6.96 | 19909.80 |
| 9971 | 2014-03-15 | -65.95 | 19916.76 |

With SQL, this is a piece of cake. Observe the usage of SUM(t.amount) OVER(...):


SELECT
  t.*,
  t.current_balance - NVL(
    SUM(t.amount) OVER (
      PARTITION BY t.account_id
      ORDER BY     t.value_date DESC,
                   t.id         DESC
      ROWS BETWEEN UNBOUNDED PRECEDING
           AND     1         PRECEDING
    ),
  0) AS balance
FROM     v_transactions t
WHERE    t.account_id = 1
ORDER BY t.value_date DESC,
         t.id         DESC

How do window functions work?

(don’t forget to book our SQL Masterclass to learn about window functions, and much more!) Despite the sometimes a bit scary syntax, window functions are really very easy to understand. Windows are “views” of the data produced in your FROM / WHERE / GROUP BY / HAVING clauses. They allow you to access all the other rows relative to the current row, while you calculate something in your SELECT clause (or rarely, in your ORDER BY clause). What the above statement really does is this:

| ID   | VALUE_DATE |  AMOUNT |  BALANCE |
|------|------------|---------|----------|
| 9997 | 2014-03-18 | -(99.17)|+19985.81 |
| 9981 | 2014-03-16 | -(71.44)| 19886.64 |
| 9979 | 2014-03-16 |-(-94.60)| 19815.20 |
| 9977 | 2014-03-16 |   -6.96 |=19909.80 |
| 9971 | 2014-03-15 |  -65.95 | 19916.76 |

I.e. for any given balance, subtract from the current balance the SUM() “OVER()” the window of all the rows that are in the same partition as the current row (same bank account), and that are strictly “above” the current row. Or, in detail:

PARTITION BY specifies “OVER()” which rows the window spans
ORDER BY specifies how the window is ordered
ROWS specifies what ordered row indexes should be considered

Can we do this with Java collections?

Yes we can! If you’re using jOOλ: A completely free Open Source, Apache 2.0 licensed library that we designed because we thought that the JDK 8 Stream and Collector APIs just don’t do it. When Java 8 was designed, a lot of focus went into supporting parallel streams. That’s nice but certainly not the only useful area where functional programming can be applied. We’ve created jOOλ to fill this gap – without implementing an all new, alternative collections API, such as vavr or functional java have. jOOλ already provides:

Tuple types
More useful stuff for ordered, sequential-only streams

With the recently released jOOλ 0.9.9, we’ve added two main new features:

Tons of new Collectors
Window functions

The many missing collectors in the JDK

The JDK ships with a couple of collectors, but they do seem awkward and verbose, and no one really appreciates writing collectors like the ones exposed in this Stack Overflow question (and many others). But the use case exposed in the linked question is a very valid one. You want to aggregate several things from a list of person:


public class Person {
    private String firstName;
    private String lastName;
    private int age;
    private double height;
    private double weight;
    // getters / setters

Assuming you have this list:


List&lt;Person&gt; personsList = new ArrayList&lt;Person&gt;();

personsList.add(new Person(&quot;John&quot;, &quot;Doe&quot;, 25, 1.80, 80));
personsList.add(new Person(&quot;Jane&quot;, &quot;Doe&quot;, 30, 1.69, 60));
personsList.add(new Person(&quot;John&quot;, &quot;Smith&quot;, 35, 174, 70));

You now want to get the following aggregations:

Number of people
Max age
Min height
Avg weight

This is a ridiculous problem for anyone used to writing SQL:


SELECT count(*), max(age), min(height), avg(weight)
FROM person

Done. How hard can it be in Java? It turns out that a lot of glue code needs to be written with vanilla JDK 8 API. Consider the sophisticated answers given

by Tagir Valeev
by TriCore

With jOOλ 0.9.9, solving this problem becomes ridiculously trivial again, and it reads almost like SQL:


Tuple result =
Seq.seq(personsList)
   .collect(
       count(),
       max(Person::getAge),
       min(Person::getHeight),
       avg(Person::getWeight)
   );

System.out.println(result);

And the result yields:

(3, Optional[35], Optional[1.69], Optional[70.0])

Note that this isn’t running a query against a SQL database (that’s what jOOQ is for). We’re running this “query” against an in-memory Java collection.

OK ok, that’s already awesome. Now what about window functions?

Right, the title of this article didn’t promise trivial aggregation stuff. It promised the awesome window functions. Yet, window functions are nothing else than aggregations (or rankings) on a subset of your data stream. Instead of aggregating all of the stream (or table) into a single record, you want to maintain the original records, and provide the aggregation on each individual record directly. A nice introductory example for window functions is the one provided in this article that explains the difference between ROW_NUMBER(), RANK(), and DENSE_RANK(). Consider the following PostgreSQL query:


SELECT
  v, 
  ROW_NUMBER() OVER(w),
  RANK()       OVER(w),
  DENSE_RANK() OVER(w)
FROM (
  VALUES('a'),('a'),('a'),('b'),
        ('c'),('c'),('d'),('e')
) t(v)
WINDOW w AS (ORDER BY v);

It yields:

| V | ROW_NUMBER | RANK | DENSE_RANK |
|---|------------|------|------------|
| a |          1 |    1 |          1 |
| a |          2 |    1 |          1 |
| a |          3 |    1 |          1 |
| b |          4 |    4 |          2 |
| c |          5 |    5 |          3 |
| c |          6 |    5 |          3 |
| d |          7 |    7 |          4 |
| e |          8 |    8 |          5 |

The same can be done in Java 8 using jOOλ 0.9.9


System.out.println(
    Seq.of(&quot;a&quot;, &quot;a&quot;, &quot;a&quot;, &quot;b&quot;, &quot;c&quot;, &quot;c&quot;, &quot;d&quot;, &quot;e&quot;)
       .window(naturalOrder())
       .map(w -&gt; tuple(
            w.value(),
            w.rowNumber(),
            w.rank(),
            w.denseRank()
       ))
       .format()
);

Yielding…

+----+----+----+----+
| v0 | v1 | v2 | v3 |
+----+----+----+----+
| a  |  0 |  0 |  0 |
| a  |  1 |  0 |  0 |
| a  |  2 |  0 |  0 |
| b  |  3 |  3 |  1 |
| c  |  4 |  4 |  2 |
| c  |  5 |  4 |  2 |
| d  |  6 |  6 |  3 |
| e  |  7 |  7 |  4 |
+----+----+----+----+

Again, do note that we’re not running any queries against a database. Everything is done in memory. Notice two things:

jOOλ’s window functions return 0-based ranks, as is expected for Java APIs, as opposed to SQL, which is all 1-based.
In Java, it is not possible to construct ad-hoc records with named columns. That’s unfortunate, and I do hope a future Java will provide support for such language features.

Let’s review what happens exactly in the code:


System.out.println(

    // This is just enumerating our values
    Seq.of(&quot;a&quot;, &quot;a&quot;, &quot;a&quot;, &quot;b&quot;, &quot;c&quot;, &quot;c&quot;, &quot;d&quot;, &quot;e&quot;)

    // Here, we specify a single window to be
    // ordered by the value T in the stream, in
    // natural order
       .window(naturalOrder())

    // The above window clause produces a Window&lt;T&gt;
    // object (the w here), which exposes...
       .map(w -&gt; tuple(

    // ... the current value itself, of type String...
            w.value(),

    // ... or various rankings or aggregations on
    // the above window.
            w.rowNumber(),
            w.rank(),
            w.denseRank()
       ))

    // Just some nice formatting to produce the table
       .format()
);

That’s it! Easy, isn’t it? We can do more! Check this out:


System.out.println(
    Seq.of(&quot;a&quot;, &quot;a&quot;, &quot;a&quot;, &quot;b&quot;, &quot;c&quot;, &quot;c&quot;, &quot;d&quot;, &quot;e&quot;)
       .window(naturalOrder())
       .map(w -&gt; tuple(
            w.value(),   // v0 
            w.count(),   // v1
            w.median(),  // v2
            w.lead(),    // v3
            w.lag(),     // v4
            w.toString() // v5
       ))
       .format()
);

What does the above yield?

+----+----+----+---------+---------+----------+
| v0 | v1 | v2 | v3      | v4      | v5       |
+----+----+----+---------+---------+----------+
| a  |  1 | a  | a       | {empty} | a        |
| a  |  2 | a  | a       | a       | aa       |
| a  |  3 | a  | b       | a       | aaa      |
| b  |  4 | a  | c       | a       | aaab     |
| c  |  5 | a  | c       | b       | aaabc    |
| c  |  6 | a  | d       | c       | aaabcc   |
| d  |  7 | b  | e       | c       | aaabccd  |
| e  |  8 | b  | {empty} | d       | aaabccde |
+----+----+----+---------+---------+----------+

Your analytics heart should be jumping, now. 4376565[1]

Wait a second. Can we do frames, too, as in SQL? Yes, we can. Just as in SQL, when we omit the frame clause on a window definition (but we do specify an ORDER BY clause), then the following is applied by default:


RANGE BETWEEN UNBOUNDED PRECEDING
  AND CURRENT ROW

We’ve done this in the previous examples. It can be seen in column v5, where we aggregate the string from the very first value up until the current value. So, let’s specify the frame then:


System.out.println(
    Seq.of(&quot;a&quot;, &quot;a&quot;, &quot;a&quot;, &quot;b&quot;, &quot;c&quot;, &quot;c&quot;, &quot;d&quot;, &quot;e&quot;)
       .window(naturalOrder(), -1, 1) // frame here
       .map(w -&gt; tuple(
            w.value(),   // v0
            w.count(),   // v1
            w.median(),  // v2
            w.lead(),    // v3
            w.lag(),     // v4
            w.toString() // v5
       ))
       .format()
);

And the result is, trivially:

+----+----+----+---------+---------+-----+
| v0 | v1 | v2 | v3      | v4      | v5  |
+----+----+----+---------+---------+-----+
| a  |  2 | a  | a       | {empty} | aa  |
| a  |  3 | a  | a       | a       | aaa |
| a  |  3 | a  | b       | a       | aab |
| b  |  3 | b  | c       | a       | abc |
| c  |  3 | c  | c       | b       | bcc |
| c  |  3 | c  | d       | c       | ccd |
| d  |  3 | d  | e       | c       | cde |
| e  |  2 | d  | {empty} | d       | de  |
+----+----+----+---------+---------+-----+

As expected, lead() and lag() are unaffected, as opposed to count(), median(), and toString()

Awesome! Now, let’s review the running total.

Often, you don’t calculate window functions on the scalar value of the stream itself, as that value is usually not a scalar value but a tuple (or a POJO in Java-speak). Instead, you extract values from the tuple (or POJO) and perform the aggregation on that. So, again, when calculating the BALANCE, we need to extract the AMOUNT first.

| ID   | VALUE_DATE |  AMOUNT |  BALANCE |
|------|------------|---------|----------|
| 9997 | 2014-03-18 | -(99.17)|+19985.81 |
| 9981 | 2014-03-16 | -(71.44)| 19886.64 |
| 9979 | 2014-03-16 |-(-94.60)| 19815.20 |
| 9977 | 2014-03-16 |   -6.96 |=19909.80 |
| 9971 | 2014-03-15 |  -65.95 | 19916.76 |

Here’s how you would write the running total with Java 8 and jOOλ 0.9.9


BigDecimal currentBalance = new BigDecimal(&quot;19985.81&quot;);

Seq.of(
    tuple(9997, &quot;2014-03-18&quot;, new BigDecimal(&quot;99.17&quot;)),
    tuple(9981, &quot;2014-03-16&quot;, new BigDecimal(&quot;71.44&quot;)),
    tuple(9979, &quot;2014-03-16&quot;, new BigDecimal(&quot;-94.60&quot;)),
    tuple(9977, &quot;2014-03-16&quot;, new BigDecimal(&quot;-6.96&quot;)),
    tuple(9971, &quot;2014-03-15&quot;, new BigDecimal(&quot;-65.95&quot;)))
.window(Comparator
    .comparing((Tuple3&lt;Integer, String, BigDecimal&gt; t) 
        -&gt; t.v1, reverseOrder())
    .thenComparing(t -&gt; t.v2), Long.MIN_VALUE, -1)
.map(w -&gt; w.value().concat(
     currentBalance.subtract(w.sum(t -&gt; t.v3)
                              .orElse(BigDecimal.ZERO))
));

Yielding

+------+------------+--------+----------+
|   v0 | v1         |     v2 |       v3 |
+------+------------+--------+----------+
| 9997 | 2014-03-18 |  99.17 | 19985.81 |
| 9981 | 2014-03-16 |  71.44 | 19886.64 |
| 9979 | 2014-03-16 | -94.60 | 19815.20 |
| 9977 | 2014-03-16 |  -6.96 | 19909.80 |
| 9971 | 2014-03-15 | -65.95 | 19916.76 |
+------+------------+--------+----------+

A couple of things have changed here:

The comparator now takes two comparisons into account. Unforunately JEP-101 wasn’t entirely implemented, which is why we need to help the compiler with type inference here.
The Window.value() is now a tuple, not a single value. So we need to extract the interesting column from it, the AMOUNT (via t -> t.v3). On the other hand, we can simply concat() that additional value to the tuple

But that’s already it. Apart from the verbosity of the comparator (which we’ll certainly address in a future jOOλ version), writing a window function is a piece of cake.

What else can we do?

This article is not a complete description of all we can do with the new API. We’ll soon write a follow-up blog post with additional examples. For instance:

The partition by clause wasn’t described, but is available too
You can specify many more windows than the single window exposed here, each with individual PARTITION BY, ORDER BY and frame specifications

Also, the current implementation is rather canonical, i.e. it doesn’t (yet) cache aggregations:

For unordered / unframed windows (same value for all the partition)
Strictly ascendingly framed windows (aggregation can be based on previous value, for associative collectors like SUM(), or toString())

That’s it from our part. Download jOOλ, play around with it and enjoy the fact that the most awesome SQL feature is now available for all of you Java 8 developers! https://github.com/jOOQ/jOOL

4 thoughts on “2016 Will be the Year Remembered as When Java Finally Had Window Functions!”

fastmark says:

January 7, 2016 at 09:19

While it might not be quite Haskell, R has had this capability through dplyr for all of the 2015…

Loading...

1. lukaseder says:
  
  January 7, 2016 at 09:35
  
  Very interesting, thanks for the pointer. Here’s a link for future visitors:
  https://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html
  
  Loading...
  
Javin says:

January 7, 2016 at 17:24

Great, Java is now coming closer to SQL. I was happy with GROUP BY clause but now Window function, surely a great gift to Java programmers.

Loading...

1. lukaseder says:
  
  January 7, 2016 at 17:41
  
  Thanks for your feedback. Yes, grouping is already something, although I do think that you won’t get happy with standard JDK API here. Here’s some suggestions that I’ve made previously:
  https://blog.jooq.org/2015/01/23/how-to-translate-sql-group-by-and-aggregations-to-java-8/
  
  And the collectors and convenience API available in jOOλ 0.9.9 are even better. Have you played with jOOλ yet?
  
  Loading...

2016 Will be the Year Remembered as When Java Finally Had Window Functions!

How do window functions work?

Can we do this with Java collections?

The many missing collectors in the JDK

OK ok, that’s already awesome. Now what about window functions?

Awesome! Now, let’s review the running total.

What else can we do?

Like this:

Published by lukaseder

4 thoughts on “2016 Will be the Year Remembered as When Java Finally Had Window Functions!”

Leave a ReplyCancel reply

How do window functions work?

Can we do this with Java collections?

The many missing collectors in the JDK

OK ok, that’s already awesome. Now what about window functions?

Awesome! Now, let’s review the running total.

What else can we do?

Like this:

Published by lukaseder

4 thoughts on “2016 Will be the Year Remembered as When Java Finally Had Window Functions!”

Leave a ReplyCancel reply

Discover more from Java, SQL and jOOQ.