Don’t Extract Everything Into a Method

Every now and then, I tweet something like this, just to piss off some clean coders:

Apart from the obvious trolling factor (why can’t I ever resist?), I do think there’s something thought provoking in such a tweet.

First off, given how rare break and continue statements are in Java code, many people probably don’t know that you can break out of an if-else block, or any labelled block, for that matter.

Secondly, we’ve been following cargo cults about clean code so much that we completely forget about some more obscure language features of our favourite languages that would be so handy in edge cases. Like these ones!

Breaking out of loops

Most people know how this works:

for (int i = 0;; i++)

    // Stop at 42
    if (i == 42)
        break;

    // Ignore even numbers
    else if (i % 2 == 0)
        continue;

    // Main action
    else
        System.out.println("i = " + i);

Of course, this code is bogus just to illustrate the syntax. A much simpler, refactored, equivalent version is this:

// Stop criteria in loop, duh
for (int i = 0; i < 42; i++)

    // Inverse if and else for even numbers
    if (i % 2 != 0)
        System.out.println("i = " + i);

This refactoring was obvious and trivial because the loop is so simple. But sometimes it isn’t, or sometimes the break or continue statement makes a loop just simpler to read (“simplicity” like “beauty” is in the eye of the beholder, of course). Parsers are the most obvious example, but also, for example, consider jOOQ’s AbstractRecord.compareTo() method, which contains this loop:

for (int i = 0; i < size(); i++) {
    final Object thisValue = get(i);
    final Object thatValue = that.get(i);

    if (thisValue == null && thatValue == null)
        continue;
    else if (thisValue == null)
        return 1;
    else if (thatValue == null)
        return -1;
    else {
        // ... Actually interesting comparisons
    }
}

// If we got through the above loop, the two records are equal
return 0;

jOOQ Records are compared with each other like SQL Records (if they are comparable, i.e. of equal degree): From the the first attribute to the last.

The interesting bit here is the continue statement, that skips to the next attribute, if both attributes are null.

Of course, if you are a feverish hater of such explicit, goto-style control flow, you would probably refactor this to something “much better”, namely:

for (int i = 0; i < size(); i++) {
    final Object thisValue = get(i);
    final Object thatValue = that.get(i);

    if (!(thisValue == null && thatValue == null)) {
        if (thisValue == null)
            return 1;
        else if (thatValue == null)
            return -1;
        else {
            // ... Actually interesting comparisons
        }
    }
}

// If we got through the above loop, the two records are equal
return 0;

Drawbacks:

  • There’s one more level of indentation
  • The if-else branch series is no longer “regular”, i.e. some conditions lead to quite different control flow than others
  • This worked fine for a single continue statement, but we might complicate this loop and add a second continue statement, or better: A nested loop that also contains break or continue statements…

With loops, the choice of using break or continue is sometimes “obvious”. Refactoring the imperative programming style to something more “elegant” might take quite a while of thinking, introduce regressions, and (ghasp) make the code more complex and less readable. Yes, I said it. Sometimes, imperative style is simpler (not better, just simpler).

Of course, this isn’t necessarily true in this particular example, so don’t haunt me 😉

Breaking out of an if

While breaking out of loops (or continuing them from in the middle) is somewhat common in complicated loops, breaking out of an if is certainly less common. In fact, you can break out of any labeled statement in Java (label-less breaks are possible with loops only):

// No-op
label1:
break label1;

// More interesting, Russian Roulette:
label2: {
    System.out.println("Pulling trigger");

    if (Math.random() < 0.16) {
        System.out.println("Click");
        break label2;
    }

    System.out.println("Whew");
}

So, why not break out of an if-else?

public class FunWithBreak {
    public static void main(String[] args) {
        label:
        if (args.length > 0) {
            System.out.println("We got args!");

            for (String arg : args)
                if ("ignore-args".equals(arg))
                    break label;

            for (String arg : args)
                System.out.println("Arg : " + arg);
        }
        else {
            System.out.println("No args");
        }

        System.out.println("End of program");
    }
}

Now run this program from the commandline

> java FunWithBreak
No args
End of program

> java FunWithBreak a b c
We got args!
Arg : a
Arg : b
Arg : c
End of program

> java FunWithBreak a b c ignore-args
We got args!
End of program

It’s kind of neat, no? The break in the middle of the if statement. It makes sense because there’s an early abort condition for the if-block, and only for the if-block.

Alternatives

Someone responded to my original tweet, that I should refactor the code and factor out a method for the if block, which would allow for replacing the break by return. Here’s how that would work:

public class FunWithBreak {
    public static void main(String[] args) {
        if (args.length > 0)
            nameThisLater(args);
        else
            System.out.println("No args");

        System.out.println("End of program");
    }

    // TODO find a meaningful name for this method later
    private static void nameThisLater(String[] args) {
        System.out.println("We got args!");

        for (String arg : args)
            if ("ignore-args".equals(arg))
                return;

        for (String arg : args)
            System.out.println("Arg : " + arg);
    }
}

Now, that certainly looks more common, which doesn’t mean it’s better. We’ve now factored out a useless method that is called only exactly once, and that is inlined (hopefully) by the JIT to become the original code again.

The advantage is that our original main() method got a bit simpler to read (once we’ve found a meaningful name for the method), but again, simplicity is in the eye of the beholder. But the price we pay here is that we have to pass all local state as method arguments. Simple in this case (single local variable args), but that can become quite complex, especially when we have local mutable collections that need to be passed around.

Besides, others might argue that one shouldn’t return from the middle of a method. They are proponents of “single-return-statement methods”.

Sure, why not. Here’s how that would look:

public class FunWithBreak {
    public static void main(String[] args) {
        if (args.length > 0) {
            ifBlock(args);
        }
        else {
            System.out.println("No args");
        }

        System.out.println("End of program");
    }

    private static void ifBlock(String[] args) {
        System.out.println("We got args!");

        boolean ignoreArgs = false;
        for (String arg : args) {
            if ("ignore-args".equals(arg)) {
                ignoreArgs = true;
                break; // We need break again!
            }
        }

        if (!ignoreArgs)
            for (String arg : args)
                System.out.println("Arg : " + arg);
    }
}

Is it really better?

I don’t know. I’ll just stick with the original break-out-of-if.

Bonus

Surprise your coworkers with an even more unusual placement of your label. Between the if statement and the ensuing block. After all, the curly braces are just an ordinary block, not part of the if statement

public class FunWithBreak {
    public static void main(String[] args) {
        if (args.length > 0) label: {
            System.out.println("We got args!");

            for (String arg : args)
                if ("ignore-args".equals(arg))
                    break label;

            for (String arg : args)
                System.out.println("Arg : " + arg);
        }
        else {
            System.out.println("No args");
        }

        System.out.println("End of program");
    }
}

Second bonus

Now that we’ve gotten all hooked on imperative control flow… What does this print?

for (int i = 0; i < 3; i++) {

    hmmm:
    try {
        System.out.println("Try      : " + i);
        if (i % 2 == 0)
            break hmmm;
        else
            throw new RuntimeException("Hmmm");
    }
    finally {
        System.out.println("Finally  : " + i);
        if (i < 2)
            continue;
    }

    System.out.println("Loop end : " + i);
}

Solution (select text below)

Try      : 0
Finally  : 0
Try      : 1
Finally  : 1
Try      : 2
Finally  : 2
Loop end : 2

SQL, Streams, For Comprehension… It’s All the Same

Recently, at Devoxx, I’ve seen this beautiful slide in a talk by Kevlin Henney

In his talk, he was displaying a variety of approaches to solve the FizzBuzz “problem”, including a couple of very elegant solutions in completely declarative approaches and languages.

In this particular slide, Kevlin used a notation that is derived from maths. The set builder notation. Here’s an example from Wikipedia:

even-numbers

The example reads: For all n in (the set of all integer numbers), take those for which there exists () another integer k, for which the following equation is satisfied: n = 2k.

Or in plain English: All even integers. (because for even integers, there exists another integer that is half the even integer)

Beautiful, eh? In imperative programming, we’d probably do something like this instead:

List<Integer> even = new ArrayList<>();
for (int i = /* hmm...? */; i < /* what to put here */; i++)
    even.add(i * 2);

Or this:

List<Integer> even = new ArrayList<>();
for (int i = /* hmm...? */; i < /* what to put here */; i = i + 2)
    even.add(i);

But there are several problems with the imperative approach:

  • We have to realistically start somewhere
  • We have to realistically end somewhere
  • We have to store all values in an intermediate collection

Sure, those aren’t severe limitations in every day use-cases, because we’re probably solving a real world problem where we don’t actually need an infinite number of even integers, and storing them in an intermediate collection doesn’t consume all of our memory, but still, the declarative, mathematical approach is much leaner, because we can still answer those questions about where to start and where to end later, and we never need to materialise any intermediate collection before we make those final decisions.

For instance, we can declare X to be that set, and then declare Y to be a set that is derived from X, and finally materialise Z, which is a very tiny set derived from Y. For this, we may have never needed to materialise all the (even) integers.

How this compares to SQL

Kevlin made a cunning comparison. Of course, all functional programming aficionados will immediately recognise that languages like Scala have something called a “for comprehension”, which models precisely the mathematical set-builder notation.

Java 8 now has the Streams API, which allows us, to some extent, model something similar (although not as powerful). But Kevlin didn’t use those “modern” languages. He used SQL as a comparison. That “arcane” declarative programming language that has been around forever, and that we love so much. Yes, here’s how we can declare all the even numbers in SQL:

SELECT n
FROM integers
WHERE EXISTS (
  SELECT k
  FROM integers
  WHERE n = 2 * k
)

If optimisers were perfect, this semi-self-join between the two references of the integers “table” could be optimised perfectly. In most databases, we’d probably manually transform the above notation to this equivalent one:

SELECT n
FROM integers
WHERE MOD(n, 2) = 0

Yes, indeed. The set-builder notation and the SQL language are very similar beasts. The former prefers using mathematical symbols for brevity and conciseness, the latter prefers using English words to connect the different operators, but it’s the same thing. And if you squint hard enough, you’ll see that Java 8 Streams, for instance, are also pretty much the same thing:

everything-is-a-table

I’ve blogged about this recently where all the Java 8 Streams operations are compared to their SQL clause counterparts:
https://blog.jooq.org/2015/08/13/common-sql-clauses-and-their-equivalents-in-java-8-streams

How is this better?

It’s simple. Both the set-builder notation, and the SQL language (and in principle, other languages’ for comprehensions) are declarative. They are expressions, which can be composed to other, more complex expressions, without necessarily executing them.

Remember the imperative approach? We tell the machine exactly what to do:

  • Start counting from this particular minimal integer value
  • Stop counting at this particular maximal integer value
  • Store all even integers in between in this particular intermediate collection

What if we don’t actually need negative integers? What if we just wanted to have a utility that calculates even integers and then reuse that to list all positive integers? Or, all positive integers less than 100? Etc.

In the imperative approach, we have to refactor constantly, to avoid the overhead of

  • Producing too many integers
  • Storing too many integers (or storing them at all)

In truly declarative languages like SQL, we’re just describing “even integers” with an expression, possibly assigning the expression a name:

CREATE VIEW even_integers AS
SELECT n
FROM integers
WHERE EXISTS (
  SELECT k
  FROM integers
  WHERE k = 2 * n
)

So, when we actually use and materialise the even integers, e.g. positive integers less than 100, the optimiser can optimise away the double access to the integer table and produce only the exact number of values that we’re requesting (without materialising them in intermediate collections):

SELECT n
FROM even_integers
WHERE n BETWEEN 0 AND 100

Conclusion

Thinking in terms of sets, in terms of declaring sets, has always been our dream as software engineers. The approach is extremely compelling and elegant. We can delegate a lot of boring algorithmic work to the implementation engine of the declarative programming language. In the case of SQL, it would be a SQL database optimiser, which figures out a great lot of optimisations that we might not have thought of.

The above example is trivial. We can perfectly live in a world where we manually iterate over a local integer variable that goes from 0 to 100:

for (int i = 0; i <= 100; i++)
  doSomething(i);

But stuff gets hairy quite quickly. Compare Mario Fusco‘s famous tweet’s two versions of the same algorithm:

This also applies to SQL, and what’s even better in SQL than with Streams: The SQL statement is a declarative expression tree, not a formally ordered set of stream pipeline operations. The optimiser can freely reorder / transform the expression tree into something that it thinks is more optimal. This isn’t just a promise. This works in modern SQL databases every day, for very complex queries, which you can write in a matter of seconds, rather than hours.

Stay tuned for a short series of blog posts on the jOOQ blog illustrating what modern cost-based optimisation can do for you, when you’re using the SQL language.

Warning: Don’t oversimplify

This article just illustrates the roots of the SQL mindset in mathematics and functional programming. Do note that modern SQL is vastly more sophisticated than its roots, and has moved away from this original paradigm to embrace other paradigms for practical reasons.

Don’t limit your SQL usage to what for comprehensions offer. There’s much more to SQL!

Comparing Imperative and Functional Algorithms in Java 8

Mario Fusco’s popular tweet impressively shows what the main difference between imperative and functional approaches to similar algorithms really is:

Both algorithms do the same thing, they’re probably equally fast and reasonable. Yet, one of the algorithms is much easier to write and read than the other. The difference lies in the fact that in imperative programming, different algorithmic requirements are spread throughout the code block, when in functional programming, each requirement has its own little line of code. Compare:

  • Green: Error handling
  • Blue: Stop criteria
  • Red: IO operations
  • Yellow: “Business logic”

Functional programming doesn’t always beat imperative programming as displayed in other examples on the jOOQ blog:

But here’s an example from Stack Overflow by user Aurora_Titanium, where the difference is as clear as in Mario Fusco’s example:

Calculating the Duplicate Values in an Array

The idea is to calculate the sum of all those values that are duplicate in a set of values. For instance, the following array:

int[] list = new int[]{1,2,3,4,5,6,7,8,8,8,9,10};

… should yield as a result something like:

Duplicate: 8. Sum of all duplicate values: 24

The imperative approach

One of the answers by user Volkan Ozkan takes an imperative approach and calculates the sum as such:

int[] array = new int[] { 
    1, 2, 3, 4, 5, 6, 7, 8, 8, 8, 9, 10 
};

int sum = 0;
for (int j = 0; j < array.length; j++)
{
    for (int k = j + 1; k < array.length; k++) 
    {
        if (k != j && array[k] == array[j])
        {
            sum = sum + array[k];
            System.out.println(
                "Duplicate found: " 
              + array[k]
              + " " 
              + "Sum of the duplicate value is " + sum);
        }
    }
}

The approach works only for sorted arrays where duplicates appear right after one another. In that case, however, it is probably an optimal solution in terms of performance, if performance really matters to this algorithm.

The functional approach

If a slight decrease of performance is acceptable to you (boxing ints, collecting them into maps), and it probably is, you can replace the above difficult-to-read code with the following bit of functional Java-8-style logic, which communicates much more clearly what it does:

int[] array = new int[] { 
    1, 2, 3, 4, 5, 6, 7, 8, 8, 8, 9, 10 
};

IntStream.of(array)
         .boxed()
         .collect(groupingBy(i -> i))
         .entrySet()
         .stream()
         .filter(e -> e.getValue().size() > 1)
         .forEach(e -> {
             System.out.println(
                 "Duplicates found for : " 
               + e.getKey()
               + " their sum being : " 
               + e.getValue()
                  .stream()
                  .collect(summingInt(i -> i)));
         });

or, with explanations:

int[] array = new int[] { 
    1, 2, 3, 4, 5, 6, 7, 8, 8, 8, 9, 10 
};

// Create a Stream<Integer> from your data
IntStream.of(array)
         .boxed()

// Group values into a Map<Integer, List<Integer>>
         .collect(groupingBy(i -> i))

// Filter out those map values that have only 
// 1 element in their group
         .entrySet()
         .stream()
         .filter(e -> e.getValue().size() > 1)

// Print the sum for the remaining groups
         .forEach(e -> {
             System.out.println(
                 "Duplicates found for : " 
               + e.getKey()
               + " their sum being : " 
               + e.getValue()
                  .stream()
                  .collect(summingInt(i -> i)));
         });

(note that the functional approach calculates sums for each duplicate value, not an overall sum, like the imperative approach. From the original question, this requirement wasn’t very clear)

As we’ve stated in a previous article on our blog, the power of functional programming via an API like the Java 8 Stream API is the fact that we’re approaching the expressive power of SQL-style declarative programming. We’re no longer concerned with remembering individual array indexes and how to calculate them and store intermediate results into some buffers. We can now focus on the really interesting logic, such as: “what’s a duplicate?” or “what sum am I interested in?”

Read on about how SQL compares to Java 8 Streams:

Common SQL clauses and their equivalents in Java 8 Streams

Rare Uses of a “ControlFlowException”

Control flows are a “relict” from imperative programming, which has leaked into various other programming paradigms, including Java’s object oriented paradigm. Apart from the useful and ubiquitous branch and loop structures, there are also primitives (e.g. GOTO) and non-locals (e.g. exceptions). Let’s have a closer look at these controversial control flow techniques.

GOTO

goto is a reserved word in the Java language. goto is also a valid instruction in JVM bytecode. Yet, in Java, it isn’t easily possible to peform goto operations. One example taken from this Stack Overflow question can be seen here:

Jumping forward

label: {
    // do stuff
    if (check) break label;
    // do more stuff
}

In bytecode:

    2  iload_1 [check]
    3  ifeq 6          // Jumping forward
    6  ..

Jumping backward

label: do {
    // do stuff
    if (check) continue label;
    // do more stuff
    break label;
} while(true);

In bytecode:

     2  iload_1 [check]
     3  ifeq 9
     6  goto 2          // Jumping backward
     9  ..

Of course, these tricks are useful only in very very rare occasions, and even then, you might want to re-consider. Because we all know what happens when we use goto in our code:

One little goto...

Drawing taken from xkcd: http://xkcd.com/292/

Breaking out of control flows with exceptions

Exceptions are a good tool to break out of a control flow structure in the event of an error or failure. But regular jumping downwards (without error or failure) can also be done using exceptions:

try {
    // Do stuff
    if (check) throw new Exception();
    // Do more stuff
}
catch (Exception notReallyAnException) {}

This feels just as kludgy as the tricks involving labels, mentioned before.

Legitimate uses of exceptions for control flow:

However, there are some other very rare occasions, where exceptions are a good tool to break out of a complex, nested control flow (without error or failure). This may be the case when you’re parsing an XML document using a SAXParser. Maybe, your logic is going to test the occurrence of at least three <check/> elements, in case of which you may want to skip parsing the rest of the document. Here is how to implement the above:

Create a ControlFlowException:

package com.example;

public class ControlFlowException 
extends SAXException {}

Note that usually, you might prefer a RuntimeException for this, but the SAX contracts require handler implementations to throw SAXException instead.

Use that ControlFlowException in a SAX handler:

package com.example;

import java.io.File;

import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;

import org.xml.sax.Attributes;
import org.xml.sax.helpers.DefaultHandler;

public class Parse {
  public static void main(String[] args) 
  throws Exception {
    SAXParser parser = SAXParserFactory
        .newInstance()
        .newSAXParser();

    try {
      parser.parse(new File("test.xml"),
          new Handler());
      System.out.println(
          "Less than 3 <check/> elements found.");
    } catch (ControlFlowException e) {
      System.out.println(
          "3 or more <check/> elements found.");
    }
  }

  private static class Handler 
  extends DefaultHandler {

    int count;

    @Override
    public void startElement(
        String uri, 
        String localName, 
        String qName,
        Attributes attributes) {
      
      if ("check".equals(qName) && ++count >= 3)
        throw new ControlFlowException();
    }
  }
}

When to use exceptions for control flow:

The above practice seems reasonable with SAX, as SAX contracts expect such exceptions to happen, even if in this case, they’re not exceptions but regular control flow. Here are some indications about when to use the above practice in real world examples:

  • You want to break out of a complex algorithm (as opposed to a simple block).
  • You can implement “handlers” to introduce behaviour into complex algorithms.
  • Those “handlers” explicitly allow throwing exceptions in their contracts.
  • Your use case does not pull the weight of actually refactoring the complex algorithm.

A real-world example: Batch querying with jOOQ

In jOOQ, it is possible to “batch store” a collection of records. Instead of running a single SQL statement for every record, jOOQ collects all SQL statements and executes a JDBC batch operation to store them all at once.

As each record encapsulates its generated SQL rendering and execution for a given store() call in an object-oriented way, it would be quite tricky to extract the SQL rendering algorithm in a reusable way, without breaking (or exposing) too many things. Instead, jOOQ’s batch operation implements this simple pseudo-algorithm:

// Pseudo-code attaching a "handler" that will
// prevent query execution and throw exceptions
// instead:
context.attachQueryCollector();

// Collect the SQL for every store operation
for (int i = 0; i < records.length; i++) {
  try {
    records[i].store();
  }

  // The attached handler will result in this
  // exception being thrown rather than actually
  // storing records to the database
  catch (QueryCollectorException e) {

    // The exception is thrown after the rendered
    // SQL statement is available
    queries.add(e.query());                
  }
}

A real-world example: Exceptionally changing behaviour

Another example from jOOQ shows how this technique can be useful to introduce exceptional behaviour that is applicable only in rare cases. As explained in issue #1520, some databases have a limitation regarding the number of possible bind values per statement. These are:

  • SQLite: 999
  • Ingres 10.1.0: 1024
  • Sybase ASE 15.5: 2000
  • SQL Server 2008: 2100

In order to circumvent this limitation, it will be necessary for jOOQ to inline all bind values, once the maximum has been reached. As jOOQ’s query model heavily encapsulates SQL rendering and variable binding behaviour by applying the composite pattern, it is not possible to know the number of bind values before traversing a query model tree. For more details about jOOQ’s query model architecture, consider this previous blog post:

https://blog.jooq.org/2012/04/10/the-visitor-pattern-re-visited

So the solution is to render the SQL statement and count bind values that are effectively going to be rendered. A canonical implementation would be this pseudo code:

String sql;

query.renderWith(countRenderer);
if (countRenderer.bindValueCount() > maxBindValues) {
  sql = query.renderWithInlinedBindValues();
}
else {
  sql = query.render();
}

As can be seen, a canonical implementation will need to render the SQL statement twice. The first rendering is used only to count the number of bind values, whereas the second rendering will generate the true SQL statement. The problem here is that the exceptional behaviour should only be put in place, once the exceptional event (too many bind values) occurs. A much better solution is to introduce a “handler” that counts bind values in a regular “rendering attempt”, throwing a ControlFlowException for those few exceptional “attempts” where the number of bind values exceeds the maximum:

// Pseudo-code attaching a "handler" that will
// abort query rendering once the maximum number
// of bind values was exceeded:
context.attachBindValueCounter();
String sql;
try {

  // In most cases, this will succeed:
  sql = query.render();
}
catch (ReRenderWithInlinedVariables e) {
  sql = query.renderWithInlinedBindValues();
}

The second solution is better, because:

  • We only re-render the query in the exceptional case.
  • We don’t finish rendering the query to calculate the actual count, but abort early for re-rendering. I.e. we don’t care if we have 2000, 5000, or 100000 bind values.

Conclusion

As with all exceptional techniques, remember to use them in the right moment. If in doubt, think again.