Are Java 8 Streams Truly Lazy? Not Completely!

Notice, this issue has been fixed in Java 8 (8u222), thanks for the comment Zheka Kozlov In a recent article, I’ve shown that programmers should always apply a filter first, map later strategy with streams. The example I made there was this one:

hugeCollection
    .stream()
    .limit(2)
    .map(e -> superExpensiveMapping(e))
    .collect(Collectors.toList());

In this case, the limit() operation implements the filtering, which should take place before the mapping. Several readers correctly mentioned that in this case, it doesn’t matter what order we’re putting the limit() and map() operations, because most operations are evaluated lazily in the Java 8 Stream API. Or rather: The collect() terminal operation pulls values from the stream lazily, and as the limit(5) operation reaches the end, it will no longer produce new values, regardless whether map() came before or after. This can be proven easily as follows:

import java.util.stream.Stream;

public class LazyStream {
    public static void main(String[] args) {
        Stream.iterate(0, i -> i + 1)
              .map(i -> i + 1)
              .peek(i -> System.out.println("Map: " + i))
              .limit(5)
              .forEach(i -> {});

        System.out.println();
        System.out.println();

        Stream.iterate(0, i -> i + 1)
              .limit(5)
              .map(i -> i + 1)
              .peek(i -> System.out.println("Map: " + i))
              .forEach(i -> {});
    }
}

The output of the above is:
Map: 1
Map: 2
Map: 3
Map: 4
Map: 5


Map: 1
Map: 2
Map: 3
Map: 4
Map: 5

But this isn’t always the case!

This optimisation is an implementation detail, and in general, it is not unwise to really apply the filter first, map later rule thoroughly, not relying on such an optimisation. In particular, the Java 8 implementation of flatMap() is not lazy. Consider the following logic, where we put a flatMap() operation in the middle of the stream:

import java.util.stream.Stream;

public class LazyStream {
    public static void main(String[] args) {
        Stream.iterate(0, i -> i + 1)
              .flatMap(i -> Stream.of(i, i, i, i))
              .map(i -> i + 1)
              .peek(i -> System.out.println("Map: " + i))
              .limit(5)
              .forEach(i -> {});

        System.out.println();
        System.out.println();

        Stream.iterate(0, i -> i + 1)
              .flatMap(i -> Stream.of(i, i, i, i))
              .limit(5)
              .map(i -> i + 1)
              .peek(i -> System.out.println("Map: " + i))
              .forEach(i -> {});
    }
}

The result is now:
Map: 1
Map: 1
Map: 1
Map: 1
Map: 2
Map: 2
Map: 2
Map: 2


Map: 1
Map: 1
Map: 1
Map: 1
Map: 2
So, the first Stream pipeline will map all the 8 flatmapped values prior to applying the limit, whereas the second Stream pipeline really limits the stream to 5 elements first, and then maps only those. The reason for this is in the flatMap() implementation:

// In ReferencePipeline.flatMap()
try (Stream<? extends R> result = mapper.apply(u)) {
    if (result != null)
        result.sequential().forEach(downstream);
}

As you can see, the result of the flatMap() operation is consumed eagerly with a terminal forEach() operation, which will always produce all the four values in our case and send them to the next operation. So, flatMap() isn’t lazy, and thus the next operation after it will get all of its results. This is true for Java 8. Future Java versions might improve this, of course. We better filter them first. And map later.

Update: flatMap() gets fixed in JDK 10

Thanks, Tagir Valeev, for pointing out that there’s a fix coming up:
Relevant links: https://bugs.openjdk.java.net/browse/JDK-8075939 http://hg.openjdk.java.net/jdk/jdk10/rev/fca88bbbafb9 However, there’s still a bug when using Stream.iterator(): https://bugs.openjdk.java.net/browse/JDK-8267359

19 thoughts on “Are Java 8 Streams Truly Lazy? Not Completely!

      1. Well, peek & foreach takes both a Consumer, so you could have done your forEach(i -> println(i)) and remove the peek. But that’s a detail.

        For the laziness of flatmap, that’s complex. Let imagine the next scenario :

                @Test
                public void lazinessOfFlatMapTest(){
                    Stream.of("Hello")
                        .flatMap(dontcare -> Stream.generate(() -> "World" + Instant.now().toString()))
                        .limit(5);
                }
        

        If I don’t terminate my stream, as flatMap is lazy (it evaluates its function only when needed), the Test finishes automatically.

        If however I put a println :

                @Test
                public void lazinessOfFlatMapTest(){
                    Stream.of("Hello")
                        .flatMap(dontcare -> Stream.generate(() -> "World" + Instant.now().toString()))
                        .limit(5)
                        .forEach(System.out::println);
                }
        

        That’s a different game. The first 5 elements are printed directly, but the function will hang. So in a sense, all elements does not need to be computed to the outer stream to lazily print them. However, in the background, I’ve an infinite loop running.

        So, for your example, even if you see only 5 elements (1,1,1,1,2), the 3 remaining 2 was indeed computed too.

        1. You’re obviously well aware that peek() and forEach() are not the same thing. I can move peek() around the pipeline to debug it, unlike forEach()…

          Yes, the infinite flatmapped stream is an excessive edge case of this problem here, showing that the status quo is really undesireable.

    1. That’s very interesting, thanks for sharing. Curious: Does the JIT optimise the two-fold array access to data[i] or would assigning that to a local variable further improve performance?

      1. Interesting question. The post aims to challenge the mantra: laziness for laziness’ sake is a good thing. Especially when, as you’ve explained, streams are more eager than you might think.

        1. Hmm, indeed. At some level the mantra breaks. However, these tools (much like SQL) are mostly designed to allow for really complex algorithms to be expressed in really simple terms, deferring optimisation until later. In a lot of business logic, this is absolutely fine, and laziness ensures that it will stay fine.

          When very high performance is essential, however, then the abstraction will inevitably break, of course. But that usually happens locally only, in a small percentage of an overall application. For instance, a high frequency trading app will put the trading logic on the highly optimised ring buffer. But user profile management can still be written with higher level abstractions, as that is not performance critical.

    1. Thanks for the link. Indeed, that’s quite similar to the issue with limit here, which is kind of a “short circuiting non-terminal operation”… I agree it’s a bug from a user perspective. I’m sure there’s an excuse lurking somewhere on the lambda-dev mailing list, though :)

  1. Interesting discovery. If I read flatMap implementation correctly, the problem arises only for streams with multiple elements returned from flatMap’s mapper, doesn’t it?

    For instance – the flatMap trick for replacing

        .filter(predicate).map(function)
    

    for

        .flatMap(element -> predicate.apply(element) ? Stream.of(function.apply(element)) : Stream.empty())
    

    is sometimes useful and seems to me it doesn’t suffer from this bug or am I missing something?

    1. Yes, you’re right, as far as I can tell, although I’d still prefer the filter().map() version. It seems to be more transparent to Stream’s internals and its optimisations…

      1. Thanks for response, Lukas. My intention is not to use flatMap trick directly in lambda form as stated, the real benefit is where lambda is provided by well-named utility method.
        Also thanks for code formatting, BTW what are formatting rules for comments of your blog? Markdown? I tried simple 4-space indentation but it didn’t work and I was unable to edit.

  2. Hello
    Great article !

    I’m a developer in Korea.
    I’ve read through your posts and felt that they are so useful and great.
    I’m wondering whether I can translate few of your posts
    and share with Korean developers who are not good at English !
    Thank you :)

    1. Hi Sophia,

      Thanks a lot for your offer. Sure, you may translate some articles to Korean. Please always:

      1. Make it very clear at the beginning (and possibly the end) of the article that it is a translation of our original content, including a backlink to our original content.
      2. Maintain all links from within the articles without modifying them (in particular: links to other content on jooq.org)
      3. Translate the content as directly as possible, without adding / removing content.

      Thanks a lot!
      Lukas

Leave a Reply