In recent days, I’ve seen a bit too much of this:
someCollection .stream() .map(e -> someFunction(e)) .collect(Collectors.toList()) .subList(0, 2);
Something is very wrong with the above example. Can you see it? No? Let me rename those variables for you.
hugeCollection .stream() .map(e -> superExpensiveMapping(e)) .collect(Collectors.toList()) .subList(0, 2);
Better now? Exactly. The above algorithm is O(N) when it could be O(1):
hugeCollection .stream() .limit(2) .map(e -> superExpensiveMapping(e)) .collect(Collectors.toList());
(Let’s assume the lack of explicit ordering is irrelevant)
I’m working mostly with SQL and helping companies tune their SQL (check out our training, btw) and as such, I’m always very eager to reduce the algorithmic complexity of queries. To some people, this seems like wizardry, but honestly, most of the time, it just boils down to adding a well placed index.
Because an index reduces the algorithmic complexity of a query algorithm from O(N) (scanning the entire table for matches) to O(log N) (scanning a B-tree index for the same matches).
Sidenote: Reducing algorithmic complexity is almost never premature optimisation. As your data set grows, bad complexity will always bite you!
The same is true for the above examples. Why would you ever traverse and transform an entire list of a huge amount of elements (N) with an expensive operation (superExpensiveMapping), when you really only need to do this only for the first two values?
SQL is a declarative language where the query optimiser gets this right automatically for you: It will (almost) always filter the data first (
WHERE clause), and only then transform it (
JOIN, GROUP BY,, etc).
Just as in SQL, when we hand-write our queries using Streams (and also in imperative programming), do always:
Filter First, Map Later
Your production system will thank you.