A Basic Programming Pattern: Filter First, Map Later

In recent days, I’ve seen a bit too much of this:

someCollection
    .stream()
    .map(e -> someFunction(e))
    .collect(Collectors.toList())
    .subList(0, 2);

Something is very wrong with the above example. Can you see it? No? Let me rename those variables for you.

hugeCollection
    .stream()
    .map(e -> superExpensiveMapping(e))
    .collect(Collectors.toList())
    .subList(0, 2);

Better now? Exactly. The above algorithm is O(N) when it could be O(1):

hugeCollection
    .stream()
    .limit(2)
    .map(e -> superExpensiveMapping(e))
    .collect(Collectors.toList());

(Let’s assume the lack of explicit ordering is irrelevant)

I’m working mostly with SQL and helping companies tune their SQL (check out our training, btw) and as such, I’m always very eager to reduce the algorithmic complexity of queries. To some people, this seems like wizardry, but honestly, most of the time, it just boils down to adding a well placed index.

Why?

Because an index reduces the algorithmic complexity of a query algorithm from O(N) (scanning the entire table for matches) to O(log N) (scanning a B-tree index for the same matches).

Sidenote: Reducing algorithmic complexity is almost never premature optimisation. As your data set grows, bad complexity will always bite you!

The same is true for the above examples. Why would you ever traverse and transform an entire list of a huge amount of elements (N) with an expensive operation (superExpensiveMapping), when you really only need to do this only for the first two values?

Conclusion

SQL is a declarative language where the query optimiser gets this right automatically for you: It will (almost) always filter the data first (WHERE clause), and only then transform it (JOIN, GROUP BY,
SELECT
, etc).

Just as in SQL, when we hand-write our queries using Streams (and also in imperative programming), do always:

Filter First, Map Later

Your production system will thank you.

Note, another interesting case with flatMap() is documented here.

4 thoughts on “A Basic Programming Pattern: Filter First, Map Later

  1. You can filter after the map and have the same results since streams evaluate lazily. It just has to be filtered before the collect.

    hugeCollection
        .stream()
        .map(e -> superExpensiveMapping(e))
        .limit(2)
        .collect(Collectors.toList());
    

    Filtering after the collect is like using HAVING instead of WHERE in SQL it works but you should use WHERE if possible.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s