A Basic Programming Pattern: Filter First, Map Later

In recent days, I’ve seen a bit too much of this:


someCollection
    .stream()
    .map(e -&gt; someFunction(e))
    .collect(Collectors.toList())
    .subList(0, 2);

Something is very wrong with the above example. Can you see it? No? Let me rename those variables for you.


hugeCollection
    .stream()
    .map(e -&gt; superExpensiveMapping(e))
    .collect(Collectors.toList())
    .subList(0, 2);

Better now? Exactly. The above algorithm is O(N) when it could be O(1):


hugeCollection
    .stream()
    .limit(2)
    .map(e -&gt; superExpensiveMapping(e))
    .collect(Collectors.toList());

(Let’s assume the lack of explicit ordering is irrelevant) I’m working mostly with SQL and helping companies tune their SQL (check out our training, btw) and as such, I’m always very eager to reduce the algorithmic complexity of queries. To some people, this seems like wizardry, but honestly, most of the time, it just boils down to adding a well placed index. Why? Because an index reduces the algorithmic complexity of a query algorithm from O(N) (scanning the entire table for matches) to O(log N) (scanning a B-tree index for the same matches).

Sidenote: Reducing algorithmic complexity is almost never premature optimisation. As your data set grows, bad complexity will always bite you!

The same is true for the above examples. Why would you ever traverse and transform an entire list of a huge amount of elements (N) with an expensive operation (superExpensiveMapping), when you really only need to do this only for the first two values?

Conclusion

SQL is a declarative language where the query optimiser gets this right automatically for you: It will (almost) always filter the data first (WHERE clause), and only then transform it (

JOIN, GROUP BY,
 SELECT

, etc). Just as in SQL, when we hand-write our queries using Streams (and also in imperative programming), do always:

Filter First, Map Later

Your production system will thank you. Note, another interesting case with flatMap() is documented here.

4 thoughts on “A Basic Programming Pattern: Filter First, Map Later”

You can filter after the map and have the same results since streams evaluate lazily. It just has to be filtered before the collect.

hugeCollection
    .stream()
    .map(e -> superExpensiveMapping(e))
    .limit(2)
    .collect(Collectors.toList());

Filtering after the collect is like using HAVING instead of WHERE in SQL it works but you should use WHERE if possible.

lukaseder says:

June 29, 2017 at 19:31

Yes sure, that optimisation works in this case, but it might not work in other cases, (e.g. the current flatMap() implementation may cause issues).

I simply didn’t want to rely on any such optimisation in this post, which would just be a distraction from the main argument.

Loading...

Reply
1. noway says:
  
  July 2, 2017 at 21:20
  
  Your example only shows that bad programming (i.e. using subList instead of stream method to filter), leads to bad performance. It does not show that you should “Filter First, Map Later”. If flatMap has this issue, then use it as an example.
  
  Loading...
  
  Reply
  1. lukaseder says:
    
    July 3, 2017 at 10:57
    
    The flatMap example is now available in this follow up blog post: https://blog.jooq.org/2017/07/03/are-java-8-streams-truly-lazy-not-completely
    
    Loading...
    
    Reply

billoneil says:

June 29, 2017 at 16:01
You can filter after the map and have the same results since streams evaluate lazily. It just has to be filtered before the collect.
```
hugeCollection
    .stream()
    .map(e -> superExpensiveMapping(e))
    .limit(2)
    .collect(Collectors.toList());
```
Filtering after the collect is like using HAVING instead of WHERE in SQL it works but you should use WHERE if possible.

Loading...
1. lukaseder says:
  
  June 29, 2017 at 19:31
  
  Yes sure, that optimisation works in this case, but it might not work in other cases, (e.g. the current flatMap() implementation may cause issues).
  
  I simply didn’t want to rely on any such optimisation in this post, which would just be a distraction from the main argument.
  
  Loading...
  
  1. noway says:
    
    July 2, 2017 at 21:20
    
    Your example only shows that bad programming (i.e. using subList instead of stream method to filter), leads to bad performance. It does not show that you should “Filter First, Map Later”. If flatMap has this issue, then use it as an example.
    
    Loading...
    
    1. lukaseder says:
      
      July 3, 2017 at 10:57
      
      The flatMap example is now available in this follow up blog post: https://blog.jooq.org/2017/07/03/are-java-8-streams-truly-lazy-not-completely
      
      Loading...

A Basic Programming Pattern: Filter First, Map Later

Conclusion

Like this:

Published by lukaseder

4 thoughts on “A Basic Programming Pattern: Filter First, Map Later”

Leave a ReplyCancel reply

Conclusion

Like this:

Published by lukaseder

4 thoughts on “A Basic Programming Pattern: Filter First, Map Later”

Leave a ReplyCancel reply

Discover more from Java, SQL and jOOQ.