Are Java 8 Streams Truly Lazy? Not Completely!

Posted on July 3, 2017May 19, 2021 by lukaseder

Notice, this issue has been fixed in Java 8 (8u222), thanks for the comment Zheka Kozlov In a recent article, I’ve shown that programmers should always apply a filter first, map later strategy with streams. The example I made there was this one:


hugeCollection
    .stream()
    .limit(2)
    .map(e -&amp;gt; superExpensiveMapping(e))
    .collect(Collectors.toList());

In this case, the limit() operation implements the filtering, which should take place before the mapping. Several readers correctly mentioned that in this case, it doesn’t matter what order we’re putting the limit() and map() operations, because most operations are evaluated lazily in the Java 8 Stream API. Or rather: The collect() terminal operation pulls values from the stream lazily, and as the limit(5) operation reaches the end, it will no longer produce new values, regardless whether map() came before or after. This can be proven easily as follows:


import java.util.stream.Stream;

public class LazyStream {
    public static void main(String[] args) {
        Stream.iterate(0, i -&amp;gt; i + 1)
              .map(i -&amp;gt; i + 1)
              .peek(i -&amp;gt; System.out.println("Map: " + i))
              .limit(5)
              .forEach(i -&amp;gt; {});

        System.out.println();
        System.out.println();

        Stream.iterate(0, i -&amp;gt; i + 1)
              .limit(5)
              .map(i -&amp;gt; i + 1)
              .peek(i -&amp;gt; System.out.println("Map: " + i))
              .forEach(i -&amp;gt; {});
    }
}

The output of the above is:

Map: 1
Map: 2
Map: 3
Map: 4
Map: 5


Map: 1
Map: 2
Map: 3
Map: 4
Map: 5

But this isn’t always the case!

This optimisation is an implementation detail, and in general, it is not unwise to really apply the filter first, map later rule thoroughly, not relying on such an optimisation. In particular, the Java 8 implementation of flatMap() is not lazy. Consider the following logic, where we put a flatMap() operation in the middle of the stream:


import java.util.stream.Stream;

public class LazyStream {
    public static void main(String[] args) {
        Stream.iterate(0, i -&amp;gt; i + 1)
              .flatMap(i -&amp;gt; Stream.of(i, i, i, i))
              .map(i -&amp;gt; i + 1)
              .peek(i -&amp;gt; System.out.println("Map: " + i))
              .limit(5)
              .forEach(i -&amp;gt; {});

        System.out.println();
        System.out.println();

        Stream.iterate(0, i -&amp;gt; i + 1)
              .flatMap(i -&amp;gt; Stream.of(i, i, i, i))
              .limit(5)
              .map(i -&amp;gt; i + 1)
              .peek(i -&amp;gt; System.out.println("Map: " + i))
              .forEach(i -&amp;gt; {});
    }
}

The result is now:

Map: 1
Map: 1
Map: 1
Map: 1
Map: 2
Map: 2
Map: 2
Map: 2


Map: 1
Map: 1
Map: 1
Map: 1
Map: 2

So, the first Stream pipeline will map all the 8 flatmapped values prior to applying the limit, whereas the second Stream pipeline really limits the stream to 5 elements first, and then maps only those. The reason for this is in the flatMap() implementation:


// In ReferencePipeline.flatMap()
try (Stream&amp;lt;? extends R&amp;gt; result = mapper.apply(u)) {
    if (result != null)
        result.sequential().forEach(downstream);
}

As you can see, the result of the flatMap() operation is consumed eagerly with a terminal forEach() operation, which will always produce all the four values in our case and send them to the next operation. So, flatMap() isn’t lazy, and thus the next operation after it will get all of its results. This is true for Java 8. Future Java versions might improve this, of course. We better filter them first. And map later.

Update: flatMap() gets fixed in JDK 10

Thanks, Tagir Valeev, for pointing out that there’s a fix coming up:

Btw flatMap gets fixed in Java 10.
— Tagir Valeev (@tagir_valeev) January 25, 2018

Relevant links: https://bugs.openjdk.java.net/browse/JDK-8075939 http://hg.openjdk.java.net/jdk/jdk10/rev/fca88bbbafb9 However, there’s still a bug when using Stream.iterator(): https://bugs.openjdk.java.net/browse/JDK-8267359

Published by lukaseder

I made jOOQ View all posts by lukaseder

19 thoughts on “Are Java 8 Streams Truly Lazy? Not Completely!”

Felipe Belluco says:

July 3, 2017 at 16:14

Hi,

Great article! Let me ask you only one thing: Never seen “i -> {}” before, what does that do?

Thanks for you attention.

Loading...

Reply
1. lukaseder says:
  
  July 3, 2017 at 16:24
  
  It’s a noop. I just needed to pass something to Stream.forEach()
  
  Loading...
  
  Reply
  1. Jonathan Schoreels says:
    
    July 3, 2017 at 18:18
    Well, peek & foreach takes both a Consumer, so you could have done your forEach(i -> println(i)) and remove the peek. But that’s a detail.
    
    For the laziness of flatmap, that’s complex. Let imagine the next scenario :
    
    @Test public void lazinessOfFlatMapTest(){ Stream.of("Hello") .flatMap(dontcare -> Stream.generate(() -> "World" + Instant.now().toString())) .limit(5); }
    
    If I don’t terminate my stream, as flatMap is lazy (it evaluates its function only when needed), the Test finishes automatically.
    
    If however I put a println :
    
    @Test public void lazinessOfFlatMapTest(){ Stream.of("Hello") .flatMap(dontcare -> Stream.generate(() -> "World" + Instant.now().toString())) .limit(5) .forEach(System.out::println); }
    
    That’s a different game. The first 5 elements are printed directly, but the function will hang. So in a sense, all elements does not need to be computed to the outer stream to lazily print them. However, in the background, I’ve an infinite loop running.
    
    So, for your example, even if you see only 5 elements (1,1,1,1,2), the 3 remaining 2 was indeed computed too.
    
    Loading...
    Reply
    1. lukaseder says:
      
      July 3, 2017 at 19:15
      
      You’re obviously well aware that peek() and forEach() are not the same thing. I can move peek() around the pipeline to debug it, unlike forEach()…
      
      Yes, the infinite flatmapped stream is an excessive edge case of this problem here, showing that the status quo is really undesireable.
      
      Loading...
      
      Reply
Richard Startin says:

July 3, 2017 at 22:13

Streams go faster when they wrap an array, assuming your stream isn’t so large it can’t fit in memory: https://richardstartin.com/2017/06/10/the-cost-of-modern-code/

Loading...

Reply
1. lukaseder says:
  
  July 4, 2017 at 09:36
  
  That’s very interesting, thanks for sharing. Curious: Does the JIT optimise the two-fold array access to data[i] or would assigning that to a local variable further improve performance?
  
  Loading...
  
  Reply
  1. Richard Startin says:
    
    July 4, 2017 at 11:07
    
    Interesting question. The post aims to challenge the mantra: laziness for laziness’ sake is a good thing. Especially when, as you’ve explained, streams are more eager than you might think.
    
    Loading...
    
    Reply
    1. lukaseder says:
      
      July 4, 2017 at 11:43
      
      Hmm, indeed. At some level the mantra breaks. However, these tools (much like SQL) are mostly designed to allow for really complex algorithms to be expressed in really simple terms, deferring optimisation until later. In a lot of business logic, this is absolutely fine, and laziness ensures that it will stay fine.
      
      When very high performance is essential, however, then the abstraction will inevitably break, of course. But that usually happens locally only, in a small percentage of an overall application. For instance, a high frequency trading app will put the trading logic on the highly optimised ring buffer. But user profile management can still be written with higher level abstractions, as that is not performance critical.
      
      Loading...
      
      Reply
Sebastian Millies says:

July 5, 2017 at 00:43

I consider the non-lazy behavior of flatMap a bug. I am not alone in this, cf. my short discussion of this issue at http://sebastian-millies.blogspot.de/2015/05/streamflatmap-may-cause-short.html which also has further pointers.

Loading...

Reply
1. lukaseder says:
  
  July 5, 2017 at 09:07
  
  Thanks for the link. Indeed, that’s quite similar to the issue with limit here, which is kind of a “short circuiting non-terminal operation”… I agree it’s a bug from a user perspective. I’m sure there’s an excuse lurking somewhere on the lambda-dev mailing list, though :)
  
  Loading...
  
  Reply
Tomáš Záluský (@tomaszalusky) says:

July 26, 2017 at 16:47
Interesting discovery. If I read flatMap implementation correctly, the problem arises only for streams with multiple elements returned from flatMap’s mapper, doesn’t it?

For instance – the flatMap trick for replacing
```
    .filter(predicate).map(function)
```
for
```
    .flatMap(element -> predicate.apply(element) ? Stream.of(function.apply(element)) : Stream.empty())
```
is sometimes useful and seems to me it doesn’t suffer from this bug or am I missing something?

Loading...
Reply
1. lukaseder says:
  
  July 26, 2017 at 16:49
  
  Yes, you’re right, as far as I can tell, although I’d still prefer the filter().map() version. It seems to be more transparent to Stream’s internals and its optimisations…
  
  Loading...
  
  Reply
  1. Tomáš Záluský (@tomaszalusky) says:
    
    July 26, 2017 at 22:36
    
    Thanks for response, Lukas. My intention is not to use flatMap trick directly in lambda form as stated, the real benefit is where lambda is provided by well-named utility method.
    Also thanks for code formatting, BTW what are formatting rules for comments of your blog? Markdown? I tried simple 4-space indentation but it didn’t work and I was unable to edit.
    
    Loading...
    
    Reply
    1. lukaseder says:
      
      July 27, 2017 at 10:15
      
      Tomas. It may come as a surprise to you, but 99.9% of the web runs on HTML, so just use <pre> ;)
      
      Loading...
      
      Reply
      1. Tomáš Záluský (@tomaszalusky) says:
        
        July 28, 2017 at 08:57
        
        LOL
        
        Loading...
        
        Reply
sophia says:

September 8, 2017 at 08:34

Hello
Great article !

I’m a developer in Korea.
I’ve read through your posts and felt that they are so useful and great.
I’m wondering whether I can translate few of your posts
and share with Korean developers who are not good at English !
Thank you :)

Loading...

Reply
1. lukaseder says:
  
  September 8, 2017 at 12:16
  
  Hi Sophia,
  
  Thanks a lot for your offer. Sure, you may translate some articles to Korean. Please always:
  
  1. Make it very clear at the beginning (and possibly the end) of the article that it is a translation of our original content, including a backlink to our original content.
  2. Maintain all links from within the articles without modifying them (in particular: links to other content on jooq.org)
  3. Translate the content as directly as possible, without adding / removing content.
  
  Thanks a lot!
  Lukas
  
  Loading...
  
  Reply
Zheka Kozlov ⬆️ (@ZhekaKozlov) says:

July 22, 2019 at 09:49

Fixed in Java 8 too (8u222): https://bugs.openjdk.java.net/browse/JDK-8225328

Loading...

Reply
1. lukaseder says:
  
  July 22, 2019 at 09:51
  
  Cool, thanks for the pointer!
  
  Loading...
  
  Reply

Are Java 8 Streams Truly Lazy? Not Completely!

But this isn’t always the case!

Update: flatMap() gets fixed in JDK 10

Like this:

Published by lukaseder

19 thoughts on “Are Java 8 Streams Truly Lazy? Not Completely!”

Leave a ReplyCancel reply

But this isn’t always the case!

Update: flatMap() gets fixed in JDK 10

Like this:

Published by lukaseder

19 thoughts on “Are Java 8 Streams Truly Lazy? Not Completely!”

Leave a ReplyCancel reply

Discover more from Java, SQL and jOOQ.