3 Reasons why You Shouldn’t Replace Your for-loops by Stream.forEach()

Awesome! We’re migrating our code base to Java 8. We’ll replace everything by functions. Throw out design patterns. Remove object orientation. Right! Let’s go!

Wait a minute

Java 8 has been out for over a year now, and the thrill has gone back to day-to-day business.

A non-representative study executed by baeldung.com from May 2015 finds that 38% of their readers have adopted Java 8. Prior to that, a late 2014 study by Typsafe had claimed 27% Java 8 adoption among their users.

What does it mean for your code-base?

Some Java 7 -> Java 8 migration refactorings are no-brainers. For instance, when passing a Callable to an ExecutorService:

ExecutorService s = ...

// Java 7 - meh...
Future<String> f = s.submit(
    new Callable<String>() {
        @Override
        public String call() {
            return "Hello World";
        }
    }
);

// Java 8 - of course!
Future<String> f = s.submit(() -> "Hello World");

The anonymous class style really doesn’t add any value here.

Apart from these no-brainers, there are other, less obvious topics. E.g. whether to use an external vs. an internal iterator. See also this interesting read from 2007 by Neil Gafter on the timeless topic:
http://gafter.blogspot.ch/2007/07/internal-versus-external-iterators.html

The result of the following two pieces of logic is the same

List<Integer> list = Arrays.asList(1, 2, 3);

// Old school
for (Integer i : list)
    System.out.println(i);

// "Modern"
list.forEach(System.out::println);

I claim that the “modern” approach should be used with extreme care, i.e. only if you truly benefit from the internal, functional iteration (e.g. when chaining a set of operations via Stream’s map(), flatMap() and other operations).

Here’s a short list of cons of the “modern” approach compared to the classic one:

1. Performance – you will lose on it

Angelika Langer has wrapped up this topic well enough in her article and the related talk that she’s giving at conferences:

Java performance tutorial – How fast are the Java 8 streams?

In many cases, performance is not critical, and you shouldn’t do any premature optimisation – so you may claim that this argument is not really an argument per se. But I will counter this attitude in this case, saying that the overhead of Stream.forEach() compared to an ordinary for loop is so significant in general that using it by default will just pile up a lot of useless CPU cycles across all of your application. If we’re talking about 10%-20% more CPU consumption just based on the choice of loop style, then we did something fundamentally wrong. Yes – individual loops don’t matter, but the load on the overall system could have been avoided.

Here’s Angelika’s benchmark result on an ordinary loop, finding the max value in a list of boxed ints:

ArrayList, for-loop : 6.55 ms
ArrayList, seq. stream: 8.33 ms

In other cases, when we’re performing relatively easy calculations on primitive data types, we absolutely SHOULD fall back to the classic for loop (and preferably to arrays, rather than collections).

Here’s Angelika’s benchmark result on an ordinary loop, finding the max value in an array of primitive ints:

int-array, for-loop : 0.36 ms
int-array, seq. stream: 5.35 ms

Premature optimisation is not good, but cargo-culting the avoidance of premature optimisation is even worse. It’s important to reflect on what context we’re in, and to make the right decisions in such a context. We’ve blogged about performance before, see our article Top 10 Easy Performance Optimisations in Java

2. Readability – for most people, at least

We’re software engineers. We’ll always discuss style of our code as if it really mattered. For instance, whitespace, or curly braces.

The reason why we do so is because maintenance of software is hard. Especially of code written by someone else. A long time ago. Who probably wrote only C code before switching to Java.

Sure, in the example we’ve had so far, we don’t really have a readability issue, the two versions are probably equivalent:

List<Integer> list = Arrays.asList(1, 2, 3);

// Old school
for (Integer i : list)
    System.out.println(i);

// "Modern"
list.forEach(System.out::println);

But what happens here:

List<Integer> list = Arrays.asList(1, 2, 3);

// Old school
for (Integer i : list)
    for (int j = 0; j < i; j++)
        System.out.println(i * j);

// "Modern"
list.forEach(i -> {
    IntStream.range(0, i).forEach(j -> {
        System.out.println(i * j);
    });
});

Things start getting a bit more interesting and unusual. I’m not saying “worse”. It’s a matter of practice and of habit. And there isn’t a black/white answer to the problem. But if the rest of the code base is imperative (and it probably is), then nesting range declarations and forEach() calls, and lambdas is certainly unusual, generating cognitive friction in the team.

You can construct examples where an imperative approach really feels more awkward than the equivalent functional one, as exposed here:

But in many situations, that’s not true, and writing the functional equivalent of something relatively easy imperative is rather hard (and again, inefficient). An example could be seen on this blog in a previous post:
https://blog.jooq.org/2015/09/09/how-to-use-java-8-functional-programming-to-generate-an-alphabetic-sequence/

In that post, we generated a sequence of characters:

A, B, ..., Z, AA, AB, ..., ZZ, AAA

… similar to the columns in MS Excel:

MS Excel column names

The imperative approach (originally by an unnamed user on Stack Overflow):

import static java.lang.Math.*;
 
private static String getString(int n) {
    char[] buf = new char[(int) floor(log(25 * (n + 1)) / log(26))];
    for (int i = buf.length - 1; i >= 0; i--) {
        n--;
        buf[i] = (char) ('A' + n % 26);
        n /= 26;
    }
    return new String(buf);
}

… probably outshines the funcitonal one on a conciseness level:

import java.util.List;
 
import org.jooq.lambda.Seq;
 
public class Test {
    public static void main(String[] args) {
        int max = 3;
 
        List<String> alphabet = Seq
            .rangeClosed('A', 'Z')
            .map(Object::toString)
            .toList();
 
        Seq.rangeClosed(1, max)
           .flatMap(length ->
               Seq.rangeClosed(1, length - 1)
                  .foldLeft(Seq.seq(alphabet), (s, i) -> 
                      s.crossJoin(Seq.seq(alphabet))
                       .map(t -> t.v1 + t.v2)))
           .forEach(System.out::println);
    }
}

And this is already using jOOλ, to simplify writing functional Java.

3. Maintainability

Let’s think again of our previous example. Instead of multiplying values, we divide them now.

List<Integer> list = Arrays.asList(1, 2, 3);

// Old school
for (Integer i : list)
    for (int j = 0; j < i; j++)
        System.out.println(i / j);

// "Modern"
list.forEach(i -> {
    IntStream.range(0, i).forEach(j -> {
        System.out.println(i / j);
    });
});

Obviously, this is asking for trouble, and we can immediately see the trouble in an exception stack trace.

Old school

Exception in thread "main" java.lang.ArithmeticException: / by zero
	at Test.main(Test.java:13)

Modern

Exception in thread "main" java.lang.ArithmeticException: / by zero
	at Test.lambda$1(Test.java:18)
	at java.util.stream.Streams$RangeIntSpliterator.forEachRemaining(Streams.java:110)
	at java.util.stream.IntPipeline$Head.forEach(IntPipeline.java:557)
	at Test.lambda$0(Test.java:17)
	at java.util.Arrays$ArrayList.forEach(Arrays.java:3880)
	at Test.main(Test.java:16)

Wow. Were we just…? Yes. These are the same reasons why we’ve had performance issues in item #1 in the first place. Internal iteration is just a lot more work for the JVM and the libraries. And this is an extremely easy use-case, we could’ve displayed the same thing with the generation of AA, AB, .., ZZ series.

From a maintenance perspective, a functional programming style can be much harder than imperative programming – especially when you blindly mix the two styles in legacy code.

Conclusion

This is usually a pro-functional programming, pro-declarative programming blog. We love lambdas. We love SQL. And combined, they can produce miracles.

But when you migrate to Java 8 and contemplate using a more functional style in your code, beware that FP is not always better – for various reasons. In fact, it is never “better”, it is just different and allows us to reason about problems differently.

We Java developers will need to practice, and come up with an intuitive understanding of when to use FP, and when to stick with OO/imperative. With the right amount of practice, combining both will help us improve our software.

Or, to put it in Uncle Bob’s terms:

The bottom, bottom line here is simply this. OO programming is good, when you know what it is. Functional programming is good when you know what it is. And functional OO programming is also good once you know what it is.

http://blog.cleancoder.com/uncle-bob/2014/11/24/FPvsOO.html

Reactive Database Access – Part 1 – Why “Async”

We’re very happy to announce a guest post series on the jOOQ blog by Manuel Bernhardt. In this blog series, Manuel will explain the motivation behind so-called reactive technologies and after introducing the concepts of Futures and Actors use them in order to access a relational database in combination with jOOQ.

manuel-bernhardtManuel Bernhardt is an independent software consultant with a passion for building web-based systems, both back-end and front-end. He is the author of “Reactive Web Applications” (Manning) and he started working with Scala, Akka and the Play Framework in 2010 after spending a long time with Java. He lives in Vienna, where he is co-organiser of the local Scala User Group. He is enthusiastic about the Scala-based technologies and the vibrant community and is looking for ways to spread its usage in the industry. He’s also scuba-diving since age 6, and can’t quite get used to the lack of sea in Austria.

This series is split in three parts, which we’ll publish over the next month:

Reactive?

The concept of Reactive Applications is getting increasingly popular these days and chances are that you have already heard of it someplace on the Internet. If not, you could read the Reactive Manifesto or we could perhaps agree to the following simple summary thereof: in a nutshell, Reactive Applications are applications that:

  • make optimal use of computational resources (in terms of CPU and memory usage) by leveraging asynchronous programming techniques
  • know how to deal with failure, degrading gracefully instead of, well, just crashing and becoming unavailable to their users
  • can adapt to intensive workloads, scaling out on several machines / nodes as load increases (and scaling back in)

Reactive Applications do not exist merely in a wild, pristine green field. At some point they will need to store and access data in order to do something meaningful and chances are that the data happens to live in a relational database.

Latency and database access

When an application talks to a database more often than not the database server is not going to be running on the same server as the application. If you’re unlucky it might even be that the server (or set of servers) hosting the application live in a different data centre than the database server. Here is what this means in terms of latency:

Latency numbers every programmer should know

Say that you have an application that runs a simple SELECT query on its front page (let’s not debate whether or not this is a good idea here). If your application and database servers live in the same data centre, you are looking at a latency of the order of 500 µs (depending on how much data comes back). Now compare this to all that your CPU could do during that time (all those green and black squares on the figure above) and keep this in mind – we’ll come back to it in just a minute.

The cost of threads

Let’s suppose that you run your welcome page query in a synchronous manner (which is what JDBC does) and wait for the result to come back from the database. During all of this time you will be monopolizing a thread that waits for the result to come back. A Java thread that just exists (without doing anything at all) can take up to 1 MB of stack memory, so if you use a threaded server that will allocate one thread per user (I’m looking at you, Tomcat) then it is in your best interest to have quite a bit of memory available for your application in order for it to still work when featured on Hacker News (1 MB / concurrent user).

Reactive applications such as the ones built with the Play Framework make use of a server that follows the evented server model: instead of following the “one user, one thread” mantra it will treat requests as a set of events (accessing the database would be one of these events) and run it through an event loop:

Evented server and its event loop

Such a server will not use many threads. For example, default configuration of the Play Framework is to create one thread per CPU core with a maximum of 24 threads in the pool. And yet, this type of server model can deal with many more concurrent requests than the threaded model given the same hardware. The trick, as it turns out, is to hand over the thread to other events when a task needs to do some waiting – or in other words: to program in an asynchronous fashion.

Painful asynchronous programming

Asynchronous programming is not really new and programming paradigms for dealing with it have been around since the 70s and have quietly evolved since then. And yet, asynchronous programming is not necessarily something that brings back happy memories to most developers. Let’s look at a few of the typical tools and their drawbacks.

Callbacks

Some languages (I’m looking at you, Javascript) have been stuck in the 70s with callbacks as their only tool for asynchronous programming up until recently (ECMAScript 6 introduced Promises). This is also knows as christmas tree programming:

Christmas tree of hell

Ho ho ho.

Threads

As a Java developer, the word asynchronous may not necessarily have a very positive connotation and is often associated to the infamous synchronized keyword:

Working with threads in Java is hard, especially when using mutable state – it is so much more convenient to let an underlying application server abstract all of the asynchronous stuff away and not worry about it, right? Unfortunately as we have just seen this comes at quite a hefty cost in terms of performance.

And I mean, just look at this stack trace:

Abstraction is the mother of all trees

In one way, threaded servers are to asynchronous programming what Hibernate is to SQL – a leaky abstraction that will cost you dearly on the long run. And once you realize it, it is often too late and you are trapped in your abstraction, fighting by all means against it in order to increase performance. Whilst for database access it is relatively easy to let go of the abstraction (just use plain SQL, or even better, jOOQ), for asynchronous programming the better tooling is only starting to gain in popularity.

Let’s turn to a programming model that finds its roots in functional programming: Futures.

Futures: the SQL of asynchronous programming

Futures as they can be found in Scala leverage functional programming techniques that have been around for decades in order to make asynchronous programming enjoyable again.

Future fundamentals

A scala.concurrent.Future[T] can be seen as a box that will eventually contain a value of type T if it succeeds. If it fails, theThrowable at the origin of the failure will be kept. A Future is said to have succeeded once the computation it is waiting for has yielded a result, or failed if there was an error during the computation. In either case, once the Future is done computing, it is said to be completed.

Welcome to the Futures

As soon as a Future is declared, it will start running, which means that the computation it tries to achieve will be executed asynchronously. For example, we can use the WS library of the Play Framework in order to execute a GET request against the Play Framework website:

val response: Future[WSResponse] = 
  WS.url("http://www.playframework.com").get()

This call will return immediately and lets us continue to do other things. At some point in the future, the call may have been executed, in which case we could access the result to do something with it. Unlike Java’s java.util.concurrent.Future<V>which lets one check whether a Future is done or block while retrieving it with the get() method, Scala’s Future makes it possible to specify what to do with the result of an execution.

Transforming Futures

Manipulating what’s inside of the box is easy as well and we do not need to wait for the result to be available in order to do so:

val response: Future[WSResponse] = 
  WS.url("http://www.playframework.com").get()

val siteOnline: Future[Boolean] = 
  response.map { r =>
    r.status == 200
  }

siteOnline.foreach { isOnline =>
  if(isOnline) {
    println("The Play site is up")
  } else {
    println("The Play site is down")
  }
}

In this example, we turn our Future[WSResponse] into a Future[Boolean] by checking for the status of the response. It is important to understand that this code will not block at any point: only when the response will be available will a thread be made available for the processing of the response and execute the code inside of the map function.

Recovering failed Futures

Failure recovery is quite convenient as well:

val response: Future[WSResponse] =
  WS.url("http://www.playframework.com").get()

val siteAvailable: Future[Option[Boolean]] = 
  response.map { r =>
    Some(r.status == 200)
  } recover {
    case ce: java.net.ConnectException => None
  }

At the very end of the Future we call the recover method which will deal with a certain type of exception and limit the dammage. In this example we are only handling the unfortunate case of a java.net.ConnectException by returning a Nonevalue.

Composing Futures

The killer feature of Futures is their composeability. A very typical use-case when building asynchronous programming workflows is to combine the results of several concurrent operations. Futures (and Scala) make this rather easy:

def siteAvailable(url: String): Future[Boolean] =
  WS.url(url).get().map { r =>
    r.status == 200
}

val playSiteAvailable =
  siteAvailable("http://www.playframework.com")

val playGithubAvailable =
  siteAvailable("https://github.com/playframework")

val allSitesAvailable: Future[Boolean] = for {
  siteAvailable <- playSiteAvailable
  githubAvailable <- playGithubAvailable
} yield (siteAvailable && githubAvailable)

The allSitesAvailable Future is built using a for comprehension which will wait until both Futures have completed. The two Futures playSiteAvailable and playGithubAvailable will start running as soon as they are being declared and the for comprehension will compose them together. And if one of those Futures were to fail, the resulting Future[Boolean] would fail directly as a result (without waiting for the other Future to complete).

This is it for the first part of this series. In the next post we will look at another tool for reactive programming and then finally at how to use those tools in combination in order to access a relational database in a reactive fashion.

Read on

Stay tuned as we’ll publish Parts 2 and 3 shortly as a part of this series:

A Subtle AutoCloseable Contract Change Between Java 7 and Java 8

A nice feature of the Java 7 try-with-resources statement and the AutoCloseable type that was introduced to work with this statement is the fact that static code analysis tools can detect resource leaks. For instance, Eclipse:

resource-leak

When you have the above configuration and you try running the following program, you’ll get three warnings:

public static void main(String[] args) 
throws Exception {
    Connection c = DriverManager.getConnection(
         "jdbc:h2:~/test", "sa", "");
    Statement s = c.createStatement();
    ResultSet r = s.executeQuery("SELECT 1 + 1");
    r.next();
    System.out.println(r.getInt(1));
}

The output is, trivially

2

The warnings are issued on all of c, s, r. A quick fix (don’t do this!) is to suppress the warning using an Eclipse-specific SuppressWarnings parameter:

@SuppressWarnings("resource")
public static void main(String[] args) 
throws Exception {
    ...
}

After all, WeKnowWhatWeReDoing™ and this is just a simple example, right?

Wrong!

The right way to fix this, even for simple examples (at least after Java 7) is to use the effortless try-with-resources statement.

public static void main(String[] args) 
throws Exception {
    try (Connection c = DriverManager.getConnection(
             "jdbc:h2:~/test", "sa", "");
         Statement s = c.createStatement();
         ResultSet r = s.executeQuery("SELECT 1 + 1")) {

        r.next();
        System.out.println(r.getInt(1));
    }
}

In fact, it would be great if Eclipse could auto-fix this warning and wrap all the individual statements in a try-with-resources statement. Upvote this feature request, please!

Great, we know this. What’s the deal with Java 8?

In Java 8, the contract on AutoCloseable has changed very subtly (or bluntly, depending on your point of view).

Java 7 version

A resource that must be closed when it is no longer needed.

Note the word "must".

Java 8 version

An object that may hold resources (such as file or socket handles) until it is closed. The close() method of an AutoCloseable object is called automatically when exiting a try-with-resources block for which the object has been declared in the resource specification header. This construction ensures prompt release, avoiding resource exhaustion exceptions and errors that may otherwise occur.

API Note:

It is possible, and in fact common, for a base class to implement AutoCloseable even though not all of its subclasses or instances will hold releasable resources. For code that must operate in complete generality, or when it is known that the AutoCloseable instance requires resource release, it is recommended to use try-with-resources constructions. However, when using facilities such as Stream that support both I/O-based and non-I/O-based forms, try-with-resources blocks are in general unnecessary when using non-I/O-based forms.

In short, from Java 8 onwards, AutoCloseable is more of a hint saying that you might be using a resource that needs to be closed, but this isn’t necessarily the case.

This is similar to the Iterable contract, which doesn’t say whether you can iterate only once, or several times over the Iterable, but it imposes a contract that is required for the foreach loop.

When do we have “optionally closeable” resources?

Take jOOQ for instance. Unlike in JDBC, a jOOQ Query (which was made AutoCloseable in jOOQ 3.7) may or may not represent a resource, depending on how you execute it. By default, it is not a resource:

try (Connection c = DriverManager.getConnection(
        "jdbc:h2:~/test", "sa", "")) {

    // No new resources created here:
    ResultQuery<Record> query =
        DSL.using(c).resultQuery("SELECT 1 + 1");

    // Resources created and closed immediately
    System.out.println(query.fetch());
}

The output is again:

+----+
|   2|
+----+
|   2|
+----+

But now, we have again an Eclipse warning on the query variable, saying that there is a resource that needs to be closed, even if by using jOOQ this way, we know that this isn’t true. The only resource in the above code is the JDBC Connection, and it is properly handled. The jOOQ-internal PreparedStatement and ResultSet are completely handled and eagerly closed by jOOQ.

Then, why implement AutoCloseable in the first place?

jOOQ inverses JDBC’s default behaviour.

  • In JDBC, everything is done lazily by default, and resources have to be closed explicitly.
  • In jOOQ, everything is done eagerly by default, and optionally, resources can be kept alive explicitly.

For instance, the following code will keep an open PreparedStatement and ResultSet:

try (Connection c = DriverManager.getConnection(
        "jdbc:h2:~/test", "sa", "");

     // We "keep" the statement open in the ResultQuery
     ResultQuery<Record> query =
         DSL.using(c)
            .resultQuery("SELECT 1 + 1")
            .keepStatement(true)) {

    // We keep the ResultSet open in the Cursor
    try (Cursor<Record> cursor = query.fetchLazy()) {
        System.out.println(cursor.fetchOne());
    }
}

With this version, we no longer have any warnings in Eclipse, but the above version is really the exception when using the jOOQ API.

The same thing is true for Java 8’s Stream API. Interestingly, Eclipse doesn’t issue any warnings here:

Stream<Integer> stream = Arrays.asList(1, 2, 3).stream();
stream.forEach(System.out::println);

Conclusion

Resource leak detection seems to be a nice IDE / compiler feature at first. But avoiding false positives is hard. Specifically, because Java 8 changed contracts on AutoCloseable, implementors are allowed to implement the AutoCloseable contract for mere convenience, not as a clear indicator of a resource being present that MUST be closed.

This makes it very hard, if not impossible, for an IDE to detect resource leaks of third party APIs (non-JDK APIs), where these contracts aren’t generally well-known. The solution is, as ever so often with static code analysis tools, to simply turn off potential resource leak detection:

resource-leak-solution

For more insight, see also this Stack Overflow answer by Stuart Marks, linking to the EG’s discussions on lambda-dev