Keeping things DRY: Method overloading

A good clean application design requires discipline in keeping things DRY:

Everything has to be done once.
Having to do it twice is a coincidence.
Having to do it three times is a pattern.

— An unknown wise man

Now, if you’re following the Xtreme Programming rules, you know what needs to be done, when you encounter a pattern:

refactor mercilessly

Because we all know what happens when you don’t:

Image Copyright (C) by hahastop.com
Image Copyright (C) by hahastop.com

Not DRY: Method overloading

One of the least DRY things you can do that is still acceptable is method overloading – in those languages that allow it (unlike Ceylon, JavaScript). Being an internal domain-specific language, the jOOQ API makes heavy use of overloading. Consider the type Field (modelling a database column):

public interface Field<T> {

    // [...]

    Condition eq(T value);
    Condition eq(Field<T> field);
    Condition eq(Select<? extends Record1<T>> query);
    Condition eq(QuantifiedSelect<? extends Record1<T>> query);

    Condition in(Collection<?> values);
    Condition in(T... values);
    Condition in(Field<?>... values);
    Condition in(Select<? extends Record1<T>> query);

    // [...]

}

So, in certain cases, non-DRY-ness is inevitable, also to a given extent in the implementation of the above API. The key rule of thumb here, however, is to always have as few implementations as possible also for overloaded methods. Try calling one method from another. For instance these two methods are very similar:

Condition eq(T value);
Condition eq(Field<T> field);

The first method is a special case of the second one, where jOOQ users do not want to explicitly declare a bind variable. It is literally implemented as such:

@Override
public final Condition eq(T value) {
    return equal(value);
}

@Override
public final Condition equal(T value) {
    return equal(Utils.field(value, this));
}

@Override
public final Condition equal(Field<T> field) {
    return compare(EQUALS, nullSafe(field));
}

@Override
public final Condition compare(Comparator comparator, Field<T> field) {
    switch (comparator) {
        case IS_DISTINCT_FROM:
        case IS_NOT_DISTINCT_FROM:
            return new IsDistinctFrom<T>(this, nullSafe(field), comparator);

        default:
            return new CompareCondition(this, nullSafe(field), comparator);
    }
}

As you can see:

  • eq() is just a synonym for the legacy equal() method
  • equal(T) is a more specialised, convenience form of equal(Field<T>)
  • equal(Field<T>) is a more specialised, convenience form of compare(Comparator, Field<T>)
  • compare() finally provides access to the implementation of this API

All of these methods are also part of the public API and can be called by the API consumer, directly, which is why the nullSafe() check is repeated in each method.

Why all the trouble?

The answer is simple.

  • There is only very little possibility of a copy-paste error throughout all the API.
  • … because the same API has to be offered for ne, gt, ge, lt, le
  • No matter what part of the API happens to be integration-tested, the implementation itself is certainly covered by some test.
  • This way, it is extremely easy to provide users with a very rich API with lots of convenience methods, as users do not want to remember how these more general-purpose methods (like compare()) really work.

The last point is particularly important, and because of risks related to backwards-compatibility, not always followed by the JDK, for instance. In order to create a Java 8 Stream from an Iterator, you have to go through all this hassle, for instance:

// Aagh, my fingers hurt...
   StreamSupport.stream(iterator.spliterator(), false);
// ^^^^^^^^^^^^^                 ^^^^^^^^^^^    ^^^^^
//       |                            |           |
// Not Stream!                        |           |
//                                    |           |
// Hmm, Spliterator. Sounds like      |           |
// Iterator. But what is it? ---------+           |
//                                                |
// What's this true and false?                    |
// And do I need to care? ------------------------+

When, intuitively, you’d like to have:

// Not Enterprise enough
iterator.stream();

In other words, subtle Java 8 Streams implementation details will soon leak into a lot of client code, and many new utility functions will wrap these things again and again.

See Brian Goetz’s explanation on Stack Overflow for details.

On the flip side of delegating overload implementations, it is of course harder (i.e. more work) to implement such an API. This is particularly cumbersome if an API vendor also allows users to implement the API themselves (e.g. JDBC). Another issue is the length of stack traces generated by such implementations. But we’ve shown before on this blog that deep stack traces can be a sign of good quality.

Now you know why.

Takeaway

The takeaway is simple. Whenever you encounter a pattern, refactor. Find the most common denominator, factor it out into an implementation, and see that this implementation is hardly ever used by delegating single responsibility steps from method to method.

By following these rules, you will:

  • Have less bugs
  • Have a more convenient API

Happy refactoring!

Deep Stack Traces Can be a Sign for Good Code Quality

The term “leaky abstractions” has been around for a while. Coining it is most often attributed to Joel Spolsky, who wrote this often-cited article about it. I’ve now stumbled upon another interpretation of a leaky abstraction, measured by the depth of a stack trace:

Leaky Abstractions as understood by Geek and Poke (Licensed CC-BY)
Leaky Abstractions as understood by Geek and Poke (Licensed CC-BY)

So, long stack traces are bad according to Geek & Poke. I’ve seen this argument before on Igor Polevoy’s blog (he’s the creator of ActiveJDBC, a Java implementation of the popular Ruby ActiveRecord query interface). Much like Joel Spolsky’s argumentation was often used to criticise ORMs, Igor’s argument was also used to compare ActiveJDBC with Hibernate. I’m citing:

One might say: so what, why do I care about the size of dependencies, depth of stack trace, etc. I think a good developer should care about these things. The thicker the framework, the more complex it is, the more memory it allocates, the more things can go wrong.

I completely agree that a framework with a certain amount of complexity tends to have longer stack traces. So if we run these axioms through our mental Prolog processors:

  • if Hibernate is a leaky abstraction, and
  • if Hibernate is complex, and
  • if complexity leads to long stack traces, then
  • leaky abstractions and long stack traces correlate

I wouldn’t go as far as claiming there’s a formal, causal connection. But a correlation seems logical.

But these things aren’t necessarily bad. In fact, long stack traces can be a good sign in terms of software quality. It can mean that the internals of a piece of software show a high amount of cohesion, a high degree of DRY-ness, which again means that there is little risk for subtle bugs deep down in your framework. Remember that a high cohesion and high DRY-ness lead to a large portion of the code being extremely relevant within the whole framework, which again means that any low-level bug will pretty much blow up the whole framework as it will lead to everything going wrong. If you do test-driven development, you’ll be rewarded by noticing immediately that your silly mistake fails 90% of your test cases.

A real-world example

Let’s use jOOQ as an example to illustrate this, as we’re already comparing Hibernate and ActiveJDBC. Some of the longest stack traces in a database access abstraction can be achieved by putting a breakpoint at the interface of that abstraction with JDBC. For instance, when fetching data from a JDBC ResultSet.

Utils.getFromResultSet(ExecuteContext, Class<T>, int) line: 1945
Utils.getFromResultSet(ExecuteContext, Field<U>, int) line: 1938
CursorImpl$CursorIterator$CursorRecordInitialiser.setValue(AbstractRecord, Field<T>, int) line: 1464
CursorImpl$CursorIterator$CursorRecordInitialiser.operate(AbstractRecord) line: 1447
CursorImpl$CursorIterator$CursorRecordInitialiser.operate(Record) line: 1
RecordDelegate<R>.operate(RecordOperation<R,E>) line: 119
CursorImpl$CursorIterator.fetchOne() line: 1413
CursorImpl$CursorIterator.next() line: 1389
CursorImpl$CursorIterator.next() line: 1
CursorImpl<R>.fetch(int) line: 202
CursorImpl<R>.fetch() line: 176
SelectQueryImpl<R>(AbstractResultQuery<R>).execute(ExecuteContext, ExecuteListener) line: 274
SelectQueryImpl<R>(AbstractQuery).execute() line: 322
T_2698Record(UpdatableRecordImpl<R>).refresh(Field<?>...) line: 438
T_2698Record(UpdatableRecordImpl<R>).refresh() line: 428
H2Test.testH2T2698InsertRecordWithDefault() line: 931

Compared to ActiveJDBC’s stack traces, that’s quite a bit more, but still less compared to Hibernate (which uses lots of reflection and instrumentation). And it involves rather cryptic inner classes with quite a bit of method overloading. How to interpret that? Let’s go through this, bottom-up (or top-down in the stack trace)

CursorRecordInitialiser

The CursorRecordInitialiser is an inner class that encapsules the initialisation of a Record by a Cursor, and it ensures that relevant parts of the ExecuteListener SPI are covered at a single place. It is the gateway to JDBC’s various ResultSet methods. It is a generic internal RecordOperation implementation that is called by…

RecordDelegate

… a RecordDelegate. While the class name is pretty meaningless, its purpose is to shield and wrap all direct record operations in a way that a central implementation of the RecordListener SPI can be achieved. This SPI can be implemented by client code to listen to active record lifecycle events. The price for keeping the implementation of this SPI DRY is a couple of elements on the stack trace, as such callbacks are the standard way to implement closures in the Java language. But keeping this logic DRY guarantees that no matter how a Record is initialised, the SPI will always be invoked. There are (almost) no forgotten corner-cases.

But we were initialising a Record in…

CursorImpl

… a CursorImpl, an implementation of a Cursor. This might appear odd, as jOOQ Cursors are used for “lazy fetching”, i.e. for fetching Records one-by-one from JDBC.

On the other hand, the SELECT query from this stack trace simply refreshes a single UpdatableRecord, jOOQ’s equivalent of an active record. Yet, still, all the lazy fetching logic is executed just as if we were fetching a large, complex data set. This is again to keep things DRY when fetching data. Of course, around 6 levels of stack trace could have been saved by simply reading the single record as we know there can be only one. But again, any subtle bug in the cursor will likely show up in some test case, even in a remote one like the test case for refreshing records.

Some may claim that all of this is wasting memory and CPU cycles. The opposite is more likely to be true. Modern JVM implementations are so good with managing and garbage-collecting short-lived objects and method calls, the slight additional complexity imposes almost no additional work to your runtime environment.

TL;DR: Long stack traces may indicate high cohesion and DRY-ness

The claim that a long stack trace is a bad thing is not necessarily correct. A long stack trace is what happens, when complex frameworks are well implemented. Complexity will inevitably lead to “leaky abstractions”. But only well-designed complexity will lead to long stack traces.

Conversely, short stack traces can mean two things:

  • Lack of complexity: The framework is simple, with few features. This matches Igor’s claim for ActiveJDBC, as he is advertising ActiveJDBC as a “simple framework”.
  • Lack of cohesion and DRY-ness: The framework is poorly written, and probably has poor test coverage and lots of bugs.

Tree data structures

As a final note, it’s worth mentioning that another case where long stack traces are inevitable is when tree structures / composite pattern structures are traversed using visitors. Anyone who has ever debugged XPath or XSLT will know how deep these traces are.

The Golden Rules of Code Documentation

Here’s another topic that is highly subjective, that leads to heated discussions, to religious wars and yet, there’s no objective right or wrong.

A previous post on my blog was reblogged to my blogging partner JavaCodeGeeks. The amount of polarised ranting this blog provoked on JCG is hilarious. Specifically, I like the fact that people tend to claim dogmatic things like:

If you need comments to clarify code, better think how to write code differently, so it is more understandable. You do not need yet another language (comments) to mess with the primary language (code).

Quite obviously, this person has written 1-2 “Hello world” applications, where this obviously holds true. My answer to that was:

How would you write this business logic down into code, such that you can live without comments?

A stock exchange order of clearing type code 27 needs to be grouped with all other subsequent orders of type code 27 (if and only if they have a rounding lot below 0.01), before actually unloading them within a time-frame of at most 35 seconds (fictional example in a real-life application).

Sure. Code can communicate “what” it does. But only comments can communicate “why” it does it! “why” is a broader truth that simply cannot be expressed in code. It involves requirements, feelings, experience, etc. etc.

So it’s time for me to write up another polarising blog post leading to (hopefully!) more heated discussions! It is about:

The Golden Rules of Code Documentation

Good documentation adds readability, transparency, stability, and trustworthiness to your application and/or API. But what is good documentation? What are constituents of good documentation?

Code is documentation

First off, indeed, code is your most significant documentation. Code holds the ultimate truth about your software. All other ways of describing what code does are only approximations for those who

  • Don’t know the code (someone else wrote it)
  • Don’t have time to read the code (it’s too complex)
  • Don’t want to read the code (who wants to read Hibernate or Xerces code to understand what’s going on??)
  • Don’t have access to the code (although they could still decompile it)

For all others, code is documentation. So, obviously, code should be written in a way that documents its purpose. So don’t write clever code, write elegant code. Here’s a good example of how to not document “purpose” (except for the few Perl native speakers):

`$=`;$_=\%!;($_)=/(.)/;$==++$|;($.,$/,$,,$\,$",$;,$^,$#,$~,$*,$:,@%)=(
$!=~/(.)(.).(.)(.)(.)(.)..(.)(.)(.)..(.)......(.)/,$"),$=++;$.++;$.++;
$_++;$_++;($_,$\,$,)=($~.$"."$;$/$%[$?]$_$\$,$:$%[$?]",$"&$~,$#,);$,++
;$,++;$^|=$";`$_$\$,$/$:$;$~$*$%[$?]$.$~$*${#}$%[$?]$;$\$"$^$~$*.>&$=`

Taken from:
http://fwebde.com/programming/write-unreadable-code/

Apparently, this prints “Just another Perl hacker.”. I certainly won’t execute this on my machine, though. Don’t blame me for any loss of data ;-)

API is documentation

While API is still code, it is that part of the code that is exposed to most others. It should thus be:

  • Very simple
  • Very concise

Simplicity is king, of course. Conciseness, however, is not exactly the same thing. It can still be simple to use an API which isn’t concise. I’d consider using Spring’s J2eeBasedPreAuthenticatedWebAuthenticationDetailsSource simple. You configure it, you inject it, done. But the name hardly indicates conciseness.

This isn’t just about documentation, but about API design in general. It should be very easy to use your API, because then, your API clearly communicates its intent. And communicating one’s intent is documentation.

Good design (and thus documentation) rules to reach conciseness are these:

  • Don’t let methods with more than 3 arguments leak into your public API.
  • Don’t let methods / types with more than 3 words in their names leak into your public API.

Best avoid the above. If you cannot avoid such methods, keep things private. These methods are not reusable and thus not worth documenting in an API.

API should be documented in words

As soon as code “leaks” into the public API, it should be documented in human-readable words. True, java.util.List.add() is already quite concise. It clearly communicates its intent. But how does it behave and why? An extract from the Javadoc:

Lists that support this operation may place limitations on what elements may be added to this list. In particular, some lists will refuse to add null elements, and others will impose restrictions on the type of elements that may be added. List classes should clearly specify in their documentation any restrictions on what elements may be added.

So, there are some well-known lists, that “refuse to add null elements” there may be “restrictions on what elements may be added”. This can’t be understood from the API’s method signature only – unless you refuse to create a concise signature.

Tracking tools are documentation

Tracking tools are your human interface to your stakeholders. These help you discuss things and provide some historicised argumentation about why code is ultimately written the way it is. Keep things DRY, here. Recognise duplicates and try to keep only one simple and concise ticket per issue.

When modifying your code in a not-so-obvious way (because your stakeholders have not-so-obvious requirements), add a short comment to the relevant code section, referencing the tracking ID:

// [#1296] FOR UPDATE is simulated in some dialects
// using ResultSet.CONCUR_UPDATABLE
if (forUpdate && 
    !asList(CUBRID, SQLSERVER).contains(context.getDialect())) {

Yes, the code itself already explains that the subsequent section is executed only in forUpdate queries and only for the CUBRID and SQLSERVER dialects. But why? A future developer will gladly read up all they can find about issue #1296. If it is relevant, you should reference this ticket ID in:

  • Mailing lists
  • Source code
  • API documentation
  • Version control checkin comments
  • Stack Overflow questions
  • All sorts of other searchable documents
  • etc.

Version control is documentation

This part of the documentation is awesome! It documents change. In large projects, you may still be able to reconstruct why a co-worker who has long ago left the company did some weird change that you don’t understand right now. It is thus important to also include the aforementioned ticket ID in the change.

So, follow this rule: Is the change non-trivial (fixed spelling, fixed indentation, renamed local variable, etc.)? Then create a ticket and document this change with a ticket ID in your commit. Creating and referencing that ticket costs you only 1 minute, but it’ll save a future coworker hours of investigation!

Version numbering is documentation

A simple and concise version numbering system will help your users understand, which version they should upgrade to. A good example of how to do this correctly is semantic versioning. The golden rules here are to use an [X].[Y].[Z] versioning scheme that can be summarised as follows:

  • If a patch release includes bugfixes, performance improvements and API-irrelevant new features, [Z] is incremented by one.
  • If a minor release includes backwards-compatible, API-relevant new features, [Y] is incremented by one and [Z] is reset to zero.
  • If a major release includes backwards-incompatible, API-relevant new features, [X] is incremented by one and [Y], [Z] are reset to zero.

Follow these rules strictly, to communicate the change scope between your released versions.

Where things go wrong

Now here’s where it starts getting emotional…

Forget UML for documentation!

Don’t manually do big UML diagrams. Well, do them. They might help you understand / explain things to others. Create ad-hoc UML diagrams for a meeting, or informal UML diagrams for a high-level tutorial. Generate UML diagrams from relevant parts of your code (or entity diagrams from your database), but don’t consider them as a central part of your code documentation. No one will ever manually update UML diagrams with 100s of classes and 1000s of relations in them.

An exception to this rule may be UML-based model-driven architectures, where the UML is really part of the code, not the documentation.

Forget MS Word or HTML for documentation (if you can)!

Keep your documentation close to the code. It is almost impossible without an extreme amount of discipline, to keep external documentation in-sync with the actual code and/or API. If you can, auto-generate external documentation from the one in your code, to keep things DRY. But if you can avoid it, don’t write up external documentation. It’s hardly ever accurate.

Of course, you can’t always avoid external documentation. Sometimes, you need to write manuals, tutorials, how-tos, best practices, etc. Just beware that those documents are almost impossible to keep in-sync with the “real truth”: Your code.

Forget writing documentation early!

Your API will evolve. Hardly anyone writes APIs that last forever, like the Java APIs. So don’t spend all that time thinking about how to eternally link class A with type B and algorithm C. Write code, document those parts of the code that leak into the API, reference ticket IDs from your code / commits

Forget documenting boilerplate code!

Getters and setters, for instance. They usually don’t do more than getting and setting. If they don’t, don’t document it, because boring documentation gets stale and thus wrong. How many times have you refactored a property (and thus the getter/setter name), but not the Javadoc? Exactly. No one updates boilerplate API documentation.

/**
 * Returns the id
 *
 * @return The id
 */
public int getId() {
    return id;
}

Aaah, the ID! Surprise surprise.

Forget documenting trivial code!

Don’t do this:

// Check if we still have work
if (!jobs.isEmpty()) {

    // Get the next job for execution
    Job job = jobs.pollFirst();

    // ... and execute it
    job.execute();
}

Duh. That code is already simple and concise, as we’ve seen before. It needs no comments at all:

if (!jobs.isEmpty()) {
    Job job = jobs.pollFirst();
    job.execute();
}

TL;DR: Keep things simple and concise

Create good documentation:

  • by keeping documentation simple and concise.
  • by keeping documentation close to the code and close to the API, which are the ultimate truths of your application.
  • by keeping your documentation DRY.
  • by making documentation available to others, through a ticketing system, version control, semantic versioning.
  • by referencing ticket IDs throughout your available media.
  • by forgetting about “external” documentation, as long as you can.

Applications, APIs, libraries that provide you with good documentation will help you create better software, because well-documented applications, APIs, libraries are better software, themselves. Critically check your stack and try to avoid those parts that are not well-documented.