The Golden Rules of Code Documentation

Here’s another topic that is highly subjective, that leads to heated discussions, to religious wars and yet, there’s no objective right or wrong.

A previous post on my blog was reblogged to my blogging partner JavaCodeGeeks. The amount of polarised ranting this blog provoked on JCG is hilarious. Specifically, I like the fact that people tend to claim dogmatic things like:

If you need comments to clarify code, better think how to write code differently, so it is more understandable. You do not need yet another language (comments) to mess with the primary language (code).

Quite obviously, this person has written 1-2 “Hello world” applications, where this obviously holds true. My answer to that was:

How would you write this business logic down into code, such that you can live without comments?

A stock exchange order of clearing type code 27 needs to be grouped with all other subsequent orders of type code 27 (if and only if they have a rounding lot below 0.01), before actually unloading them within a time-frame of at most 35 seconds (fictional example in a real-life application).

Sure. Code can communicate “what” it does. But only comments can communicate “why” it does it! “why” is a broader truth that simply cannot be expressed in code. It involves requirements, feelings, experience, etc. etc.

So it’s time for me to write up another polarising blog post leading to (hopefully!) more heated discussions! It is about:

The Golden Rules of Code Documentation

Good documentation adds readability, transparency, stability, and trustworthiness to your application and/or API. But what is good documentation? What are constituents of good documentation?

Code is documentation

First off, indeed, code is your most significant documentation. Code holds the ultimate truth about your software. All other ways of describing what code does are only approximations for those who

  • Don’t know the code (someone else wrote it)
  • Don’t have time to read the code (it’s too complex)
  • Don’t want to read the code (who wants to read Hibernate or Xerces code to understand what’s going on??)
  • Don’t have access to the code (although they could still decompile it)

For all others, code is documentation. So, obviously, code should be written in a way that documents its purpose. So don’t write clever code, write elegant code. Here’s a good example of how to not document “purpose” (except for the few Perl native speakers):

`$=`;$_=\%!;($_)=/(.)/;$==++$|;($.,$/,$,,$\,$",$;,$^,$#,$~,$*,$:,@%)=(
$!=~/(.)(.).(.)(.)(.)(.)..(.)(.)(.)..(.)......(.)/,$"),$=++;$.++;$.++;
$_++;$_++;($_,$\,$,)=($~.$"."$;$/$%[$?]$_$\$,$:$%[$?]",$"&$~,$#,);$,++
;$,++;$^|=$";`$_$\$,$/$:$;$~$*$%[$?]$.$~$*${#}$%[$?]$;$\$"$^$~$*.>&$=`

Taken from:
http://fwebde.com/programming/write-unreadable-code/

Apparently, this prints “Just another Perl hacker.”. I certainly won’t execute this on my machine, though. Don’t blame me for any loss of data ;-)

API is documentation

While API is still code, it is that part of the code that is exposed to most others. It should thus be:

  • Very simple
  • Very concise

Simplicity is king, of course. Conciseness, however, is not exactly the same thing. It can still be simple to use an API which isn’t concise. I’d consider using Spring’s J2eeBasedPreAuthenticatedWebAuthenticationDetailsSource simple. You configure it, you inject it, done. But the name hardly indicates conciseness.

This isn’t just about documentation, but about API design in general. It should be very easy to use your API, because then, your API clearly communicates its intent. And communicating one’s intent is documentation.

Good design (and thus documentation) rules to reach conciseness are these:

  • Don’t let methods with more than 3 arguments leak into your public API.
  • Don’t let methods / types with more than 3 words in their names leak into your public API.

Best avoid the above. If you cannot avoid such methods, keep things private. These methods are not reusable and thus not worth documenting in an API.

API should be documented in words

As soon as code “leaks” into the public API, it should be documented in human-readable words. True, java.util.List.add() is already quite concise. It clearly communicates its intent. But how does it behave and why? An extract from the Javadoc:

Lists that support this operation may place limitations on what elements may be added to this list. In particular, some lists will refuse to add null elements, and others will impose restrictions on the type of elements that may be added. List classes should clearly specify in their documentation any restrictions on what elements may be added.

So, there are some well-known lists, that “refuse to add null elements” there may be “restrictions on what elements may be added”. This can’t be understood from the API’s method signature only – unless you refuse to create a concise signature.

Tracking tools are documentation

Tracking tools are your human interface to your stakeholders. These help you discuss things and provide some historicised argumentation about why code is ultimately written the way it is. Keep things DRY, here. Recognise duplicates and try to keep only one simple and concise ticket per issue.

When modifying your code in a not-so-obvious way (because your stakeholders have not-so-obvious requirements), add a short comment to the relevant code section, referencing the tracking ID:

// [#1296] FOR UPDATE is simulated in some dialects
// using ResultSet.CONCUR_UPDATABLE
if (forUpdate && 
    !asList(CUBRID, SQLSERVER).contains(context.getDialect())) {

Yes, the code itself already explains that the subsequent section is executed only in forUpdate queries and only for the CUBRID and SQLSERVER dialects. But why? A future developer will gladly read up all they can find about issue #1296. If it is relevant, you should reference this ticket ID in:

  • Mailing lists
  • Source code
  • API documentation
  • Version control checkin comments
  • Stack Overflow questions
  • All sorts of other searchable documents
  • etc.

Version control is documentation

This part of the documentation is awesome! It documents change. In large projects, you may still be able to reconstruct why a co-worker who has long ago left the company did some weird change that you don’t understand right now. It is thus important to also include the aforementioned ticket ID in the change.

So, follow this rule: Is the change non-trivial (fixed spelling, fixed indentation, renamed local variable, etc.)? Then create a ticket and document this change with a ticket ID in your commit. Creating and referencing that ticket costs you only 1 minute, but it’ll save a future coworker hours of investigation!

Version numbering is documentation

A simple and concise version numbering system will help your users understand, which version they should upgrade to. A good example of how to do this correctly is semantic versioning. The golden rules here are to use an [X].[Y].[Z] versioning scheme that can be summarised as follows:

  • If a patch release includes bugfixes, performance improvements and API-irrelevant new features, [Z] is incremented by one.
  • If a minor release includes backwards-compatible, API-relevant new features, [Y] is incremented by one and [Z] is reset to zero.
  • If a major release includes backwards-incompatible, API-relevant new features, [X] is incremented by one and [Y], [Z] are reset to zero.

Follow these rules strictly, to communicate the change scope between your released versions.

Where things go wrong

Now here’s where it starts getting emotional…

Forget UML for documentation!

Don’t manually do big UML diagrams. Well, do them. They might help you understand / explain things to others. Create ad-hoc UML diagrams for a meeting, or informal UML diagrams for a high-level tutorial. Generate UML diagrams from relevant parts of your code (or entity diagrams from your database), but don’t consider them as a central part of your code documentation. No one will ever manually update UML diagrams with 100s of classes and 1000s of relations in them.

An exception to this rule may be UML-based model-driven architectures, where the UML is really part of the code, not the documentation.

Forget MS Word or HTML for documentation (if you can)!

Keep your documentation close to the code. It is almost impossible without an extreme amount of discipline, to keep external documentation in-sync with the actual code and/or API. If you can, auto-generate external documentation from the one in your code, to keep things DRY. But if you can avoid it, don’t write up external documentation. It’s hardly ever accurate.

Of course, you can’t always avoid external documentation. Sometimes, you need to write manuals, tutorials, how-tos, best practices, etc. Just beware that those documents are almost impossible to keep in-sync with the “real truth”: Your code.

Forget writing documentation early!

Your API will evolve. Hardly anyone writes APIs that last forever, like the Java APIs. So don’t spend all that time thinking about how to eternally link class A with type B and algorithm C. Write code, document those parts of the code that leak into the API, reference ticket IDs from your code / commits

Forget documenting boilerplate code!

Getters and setters, for instance. They usually don’t do more than getting and setting. If they don’t, don’t document it, because boring documentation gets stale and thus wrong. How many times have you refactored a property (and thus the getter/setter name), but not the Javadoc? Exactly. No one updates boilerplate API documentation.

/**
 * Returns the id
 *
 * @return The id
 */
public int getId() {
    return id;
}

Aaah, the ID! Surprise surprise.

Forget documenting trivial code!

Don’t do this:

// Check if we still have work
if (!jobs.isEmpty()) {

    // Get the next job for execution
    Job job = jobs.pollFirst();

    // ... and execute it
    job.execute();
}

Duh. That code is already simple and concise, as we’ve seen before. It needs no comments at all:

if (!jobs.isEmpty()) {
    Job job = jobs.pollFirst();
    job.execute();
}

TL;DR: Keep things simple and concise

Create good documentation:

  • by keeping documentation simple and concise.
  • by keeping documentation close to the code and close to the API, which are the ultimate truths of your application.
  • by keeping your documentation DRY.
  • by making documentation available to others, through a ticketing system, version control, semantic versioning.
  • by referencing ticket IDs throughout your available media.
  • by forgetting about “external” documentation, as long as you can.

Applications, APIs, libraries that provide you with good documentation will help you create better software, because well-documented applications, APIs, libraries are better software, themselves. Critically check your stack and try to avoid those parts that are not well-documented.

Defensive API evolution with Java interfaces

API evolution is something absolutely non-trivial. Something that only few have to deal with. Most of us work on internal, proprietary APIs every day. Modern IDEs ship with awesome tooling to factor out, rename, pull up, push down, indirect, delegate, infer, generalise our code artefacts. These tools make refactoring our internal APIs a piece of cake.

But some of us work on public APIs, where the rules change drastically. Public APIs, if done properly, are versioned. Every change – compatible or incompatible – should be published in a new API version. Most people will agree that API evolution should be done in major and minor releases, similar to what is specified in semantic versioning. In short: Incompatible API changes are published in major releases (1.0, 2.0, 3.0), whereas compatible API changes / enhancements are published in minor releases (1.0, 1.1, 1.2).

If you’re planning ahead, you’re going to foresee most of your incompatible changes a long time before actually publishing the next major release. A good tool in Java to announce such a change early is deprecation.

Interface API evolution

Now, deprecation is a good tool to indicate that you’re about to remove a type or member from your API. What if you’re going to add a method, or a type to an interface’s type hierarchy? This means that all client code implementing your interface will break – at least as long as Java 8’s defender methods aren’t introduced yet. There are several techniques to circumvent / work around this problem:

1. Don’t care about it

Yes, that’s an option too. Your API is public, but maybe not so much used. Let’s face it: Not all of us work on the JDK / Eclipse / Apache / etc codebases.

If you’re friendly, you’re at least going to wait for a major release to introduce new methods. But you can break the rules of semantic versioning if you really have to – if you can deal with the consequences of getting a mob of angry users.

Note, though, that other platforms aren’t as backwards-compatible as the Java universe (often by language design, or by language complexity). E.g. with Scala’s various ways of declaring things as implicit, your API can’t always be perfect.

2. Do it the Java way

The “Java” way is not to evolve interfaces at all. Most API types in the JDK have been the way they are today forever. Of course, this makes APIs feel quite “dinosaury” and adds a lot of redundancy between various similar types, such as StringBuffer and StringBuilder, or Hashtable and HashMap.

Note that some parts of Java don’t adhere to the “Java” way. Most specifically, this is the case for the JDBC API, which evolves according to the rules of section #1: “Don’t care about it”.

3. Do it the Eclipse way

Eclipse’s internals contain huge APIs. There are a lot of guidelines how to evolve your own APIs (i.e. public parts of your plugin), when developing for / within Eclipse. One example about how the Eclipse guys extend interfaces is the IAnnotationHover type. By Javadoc contract, it allows implementations to also implement IAnnotationHoverExtension and IAnnotationHoverExtension2. Obviously, in the long run, such an evolved API is quite hard to maintain, test, and document, and ultimately, hard to use! (consider ICompletionProposal and its 6 (!) extension types)

4. Wait for Java 8

In Java 8, you will be able to make use of defender methods. This means that you can provide a sensible default implementation for your new interface methods as can be seen in Java 1.8’s java.util.Iterator (an extract):

public interface Iterator<E> {

    // These methods are kept the same:
    boolean hasNext();
    E next();

    // This method is now made "optional" (finally!)
    public default void remove() {
        throw new UnsupportedOperationException("remove");
    }

    // This method has been added compatibly in Java 1.8
    default void forEach(Consumer<? super E> consumer) {
        Objects.requireNonNull(consumer);
        while (hasNext())
            consumer.accept(next());
    }
}

Of course, you don’t always want to provide a default implementation. Often, your interface is a contract that has to be implemented entirely by client code.

5. Provide public default implementations

In many cases, it is wise to tell the client code that they may implement an interface at their own risk (due to API evolution), and they should better extend a supplied abstract or default implementation, instead. A good example for this is java.util.List, which can be a pain to implement correctly. For simple, not performance-critical custom lists, most users probably choose to extend java.util.AbstractList instead. The only methods left to implement are then get(int) and size(), The behaviour of all other methods can be derived from these two:

class EmptyList<E> extends AbstractList<E> {
    @Override
    public E get(int index) {
        throw new IndexOutOfBoundsException("No elements here");
    }

    @Override
    public int size() {
        return 0;
    }
}

A good convention to follow is to name your default implementation AbstractXXX if it is abstract, or DefaultXXX if it is concrete

6. Make your API very hard to implement

Now, this isn’t really a good technique, but just a probable fact. If your API is very hard to implement (you have 100s of methods in an interface), then users are probably not going to do it. Note: probably. Never underestimate the crazy user. An example of this is jOOQ’s org.jooq.Field type, which represents a database field / column. In fact, this type is part of jOOQ’s internal domain specific language, offering all sorts of operations and functions that can be performed upon a database column.

Of course, having so many methods is an exception and – if you’re not designing a DSL – is probably a sign of a bad overall design.

7. Add compiler and IDE tricks

Last but not least, there are some nifty tricks that you can apply to your API, to help people understand what they ought to do in order to correctly implement your interface-based API. Here’s a tough example, that slaps the API designer’s intention straight into your face. Consider this extract of the org.hamcrest.Matcher API:

public interface Matcher<T> extends SelfDescribing {

    // This is what a Matcher really does.
    boolean matches(Object item);
    void describeMismatch(Object item, Description mismatchDescription);

    // Now check out this method here:

    /**
     * This method simply acts a friendly reminder not to implement 
     * Matcher directly and instead extend BaseMatcher. It's easy to 
     * ignore JavaDoc, but a bit harder to ignore compile errors .
     *
     * @see Matcher for reasons why.
     * @see BaseMatcher
     * @deprecated to make
     */
    @Deprecated
    void _dont_implement_Matcher___instead_extend_BaseMatcher_();
}

“Friendly reminder”, come on. ;-)

Other ways

I’m sure there are dozens of other ways to evolve an interface-based API. I’m curious to hear your thoughts!

JavaBeans™ should be extended to reduce bloat

JavaBeans™ has been around for a long time in the Java world. At some point of time, people realised that the concept of getters and setters was good to provide some abstraction over “object properties”, which should not be accessed directly. A typical “bean” would look like this:

public class MyBean {
    private int myProperty;

    public int getMyProperty() {
        return myProperty;
    }

    public void setMyProperty(int myProperty) {
        this.myProperty = myProperty;
    }
}

In various expression languages and other notations, you could then access “myProperty” using a simple property notation, which is good:

// The below would resolve to myBean.getMyProperty()
myBean.myProperty

// This could resolve to myBean.setMyProperty(5)
myBean.myProperty = 5

Critique on Java properties

Other languages, such as C# even allow to inline such property expressions in regular C# code, in order to call getters and setters. Why not Java?

Getter and Setter naming

Why do I have to use those bloated “get”/”is” and “set” prefixes every time I want to manipulate object properties? Besides, the case of the first letter of the property changes, too. If you want to perform a case-sensitive search on all usage of a property, you will have to write quite a regular expression to do so

Setter returning void

Returning void is one of the biggest reasons Java generates so much bloat at API call sites. Since the early days of Java, method chaining was a wide-spread practice. No one would like to miss StringBuilder’s (or StringBuffer’s) chainable append() methods. They’re very useful. Why doesn’t the Java compiler allow to re-access the property container after calling a setter?

A better Java

In other words, this API:

public interface API {
    void oneMethod();
    void anotherMethod();
    void setX(int x);
    int  getX();
}

Should be usable as such:

API api = ...
int x = api.oneMethod()     // Returning void should in fact "return" api
           .anotherMethod() // Returning void should in fact "return" api
           .x;              // Getter access, if x is not accessible

Let’s make this a JSR!

The depths of Java: API leak exposed through covariance

Java can be very tricky some times, especially in API design. Let’s have a look at a very interesting showcase. jOOQ strongly separates API from implementation. All API is in the org.jooq package, and public. Most implementation is in the org.jooq.impl package and package-private. Only factories and some dedicated base implementations are public. This allows for very powerful package-level encapsulation, exposing mostly only interfaces to jOOQ users.

A simplified example of package-level encapsulation

Here’s roughly how jOOQ models SQL tables. The (overly simplified) API:

package org.jooq;

/**
 * A table in a database
 */
public interface Table {

  /**
   * Join two tables
   */
  Table join(Table table);
}

And two (overly simplified) implementation classes:

package org.jooq.impl;

import org.jooq.Table;

/**
 * Base implementation
 */
abstract class AbstractTable implements Table {

  @Override
  public Table join(Table table) {
    return null;
  }
}

/**
 * Custom implementation, publicly exposed to client code
 */
public class CustomTable extends AbstractTable {
}

How the internal API is exposed

Let’s assume that the internal API does some tricks with covariance:

abstract class AbstractTable implements Table, InteralStuff {

  // Note, this method returns AbstractTable, as it might
  // prove to be convenient to expose some internal API
  // facts within the internal API itself
  @Override
  public AbstractTable join(Table table) {
    return null;
  }

  /**
   * Some internal API method, also package private
   */
  void doThings() {}
  void doMoreThings() {

    // Use the internal API
    join(this).doThings();
  }
}

This looks all safe at the first sight, but is it? AbstractTable is package-private, but CustomTable extends it and inherits all of its API, including the covariant method override of “AbstractTable join(Table)”. What does that result in? Check out the following piece of client code

package org.jooq.test;

import org.jooq.Table;
import org.jooq.impl.CustomTable;

public class Test {
  public static void main(String[] args) {
    Table joined = new CustomTable();

    // This works, no knowledge of AbstractTable exposed to the compiler
    Table table1 = new CustomTable();
    Table join1 = table1.join(joined);

    // This works, even if join exposes AbstractTable
    CustomTable table2 = new CustomTable();
    Table join2 = table2.join(joined);

    // This doesn't work. The type AbstractTable is not visible
    Table join3 = table2.join(joined).join(joined);
    //            ^^^^^^^^^^^^^^^^^^^ This cannot be dereferenced

    // ... so hide these implementation details again
    // The API flaw can be circumvented with casting
    Table join4 = ((Table) table2.join(joined)).join(joined);
  }
}

Conclusion

Tampering with visibilities in class hierarchies can be dangerous. Beware of the fact that API methods declared in interfaces are always public, regardless of any covariant implementations that involve non-public artefacts. This can be quite annoying for API users when not properly dealt with by API designers.

Fixed in the next version of jOOQ :-)

The good API design

I’ve stumbled upon a nice checklist wrapping up API design guidelines. An extract:

  1. Favor placing API and implementation into separate packages
  2. Favor placing APIs into high-level packages and implementation into lower-level packages
  3. Consider breaking up large APIs into several packages
  4. Consider putting API and implementation packages into separate Java archives
  5. Avoid (minimize) internal dependencies on implementation classes in APIs
  6. Avoid unnecessary API fragmentation
  7. Do not place public implementation classes in the API package
  8. Do not create dependencies between callers and implementation classes
  9. Do not place unrelated APIs into the same package
  10. Do not place API and SPI into the same package
  11. Do not move or rename the package of an already released public API

See the full checklist here:

http://theamiableapi.com/2012/01/16/java-api-design-checklist/

A neater way to use reflection in Java

Reflection in Java really feels awkward. The java.lang.reflect API is very powerful and complete, and in that sense also very verbose. Unlike in most scripting languages, there is no convenient way to access methods and fields dynamically using reflection. By convenient, I mean things like this

// PHP
$method = 'my_method';
$field  = 'my_field';

// Dynamically call a method
$object->$method();

// Dynamically access a field
$object->$field;

Or even better

// JavaScript
var method   = 'my_method';
var field    = 'my_field';

// Dynamically call a function
object[method]();

// Dynamically access a field
object[field];

For us Java guys, this is something we can only dream of. We would write this

String method = "my_method";
String field  = "my_field";

// Dynamically call a method
object.getClass().getMethod(method).invoke(object);

// Dynamically access a field
object.getClass().getField(field).get(object);

Obviously, this doesn’t take care of NullPointerExceptions, InvocationTargetExceptions, IllegalAccessExceptions, IllegalArgumentExceptions, SecurityExceptions, primitive types, etc. In an enterprise world, things need to be safe and secure, and the Java architects have thought of all possible problems that could arise when using reflection. But in many cases, we know what we’re doing and we don’t care about most of those features. We’d prefer the less verbose way.

That’s why I have created another sibling in the jOO* family: jOOR (Java Object Oriented Reflection). While this is not a killer libary, it might be useful for 1-2 developers out there looking for a simple and fluent solution. Here’s an example I have recently encountered on stack overflow, where jOOR might fit in just perfectly:

// Classic example of reflection usage
try {
  Method m1 = department.getClass().getMethod("getEmployees");
  Employee employees = (Employee[]) m1.invoke(department);

  for (Employee employee : employees) {
    Method m2 = employee.getClass().getMethod("getAddress");
    Address address = (Address) m2.invoke(employee);

    Method m3 = address.getClass().getMethod("getStreet");
    Street street = (Street) m3.invoke(address);

    System.out.println(street);
  }
}

// There are many checked exceptions that you are likely to ignore anyway 
catch (Exception ignore) {

  // ... or maybe just wrap in your preferred runtime exception:
  throw new RuntimeException(e);
}

And the same example using jOOR:

Employee[] employees = on(department).call("getEmployees").get();

for (Employee employee : employees) {
  Street street = on(employee).call("getAddress").call("getStreet").get();
  System.out.println(street);
}

See the full Stack Overflow question here:

http://stackoverflow.com/questions/4385003/java-reflection-open-source/8672186

Another example:

String world = 
on("java.lang.String")  // Like Class.forName()
 .create("Hello World") // Call the most specific matching constructor
 .call("substring", 6)  // Call the most specific matching method
 .call("toString")      // Call toString()
 .get()                 // Get the wrapped object, in this case a String

Get jOOR for free here:

http://code.google.com/p/joor/

SQL in Scala, where jOOQ could go

I have recently blogged about how simple it is to integrate jOOQ into Scala. See the full blog post here:

https://lukaseder.wordpress.com/2011/12/11/the-ultimate-sql-dsl-jooq-in-scala/

I’m more and more thrilled by that option, as Scala is one of the fastest emerging JVM languages nowadays. The plain integration of a Java library in Scala leaves some open questions. jOOQ knows the “val()” operator or method, to create bind values. See the manual here:

http://www.jooq.org/manual/JOOQ/BindValues/

This operator cannot be used in Scala, as Scala declares “val” as a reserved word. I’ve had similar issues in Java before, when trying to use “case” or “else” in the API, which is not possible either. The path of least resistance is to overload or re-name those methods. “val” was overloaded as “value” in jOOQ 2.0.1. “case” and “else” were re-named a long time ago as “decode” (from Oracle’s DECODE function), and “otherwise” (as in XSL).

So for a full-fledged integration in Scala, jOOQ should be wrapped in a new API called scOOQ ;-). This new API should take Scala’s language features into account in order to make working with jOOQ a lot easier. This could be the chance to re-engineer some of the API and make all API methods uppercase, as is usual with SQL. With Scala’s ability of omitting syntax elements, such as “.” and “()”, the API could then declare one-word methods, such as in this Java example:

MERGE().INTO(MY_TABLE)
       .USING(SOURCE_TABLE)
       .ON(MY_TABLE.ID.equal(SOURCE_TABLE.ID))
       .WHEN().MATCHED().THEN().UPDATE()
         .SET(ID, 1)
         .SET(DATA, "Data")
       .WHEN().NOT().MATCHED().THEN().INSERT(MY_TABLE.ID, MY_TABLE.DATA)
       .VALUES(1, "Data")

While in Java, this looks quite nasty and verbose, in Scala it could be very nice! The below statement should compile in Scala if the API was declared as such:

MERGE INTO MY_TABLE
      USING (SOURCE_TABLE)
      ON (MY_TABLE.ID equal SOURCE_TABLE.ID)
      WHEN MATCHED THEN UPDATE
        SET (ID, 1)
        SET (DATA, "Data")
      WHEN NOT MATCHED THEN INSERT (MY_TABLE.ID, MY_TABLE.DATA)
      VALUES (1, "Data")

Convinced? Contributions very welcome! :-)