The Golden Rules of Code Documentation

Here’s another topic that is highly subjective, that leads to heated discussions, to religious wars and yet, there’s no objective right or wrong.

A previous post on my blog was reblogged to my blogging partner JavaCodeGeeks. The amount of polarised ranting this blog provoked on JCG is hilarious. Specifically, I like the fact that people tend to claim dogmatic things like:

If you need comments to clarify code, better think how to write code differently, so it is more understandable. You do not need yet another language (comments) to mess with the primary language (code).

Quite obviously, this person has written 1-2 “Hello world” applications, where this obviously holds true. My answer to that was:

How would you write this business logic down into code, such that you can live without comments?

A stock exchange order of clearing type code 27 needs to be grouped with all other subsequent orders of type code 27 (if and only if they have a rounding lot below 0.01), before actually unloading them within a time-frame of at most 35 seconds (fictional example in a real-life application).

Sure. Code can communicate “what” it does. But only comments can communicate “why” it does it! “why” is a broader truth that simply cannot be expressed in code. It involves requirements, feelings, experience, etc. etc.

So it’s time for me to write up another polarising blog post leading to (hopefully!) more heated discussions! It is about:

The Golden Rules of Code Documentation

Good documentation adds readability, transparency, stability, and trustworthiness to your application and/or API. But what is good documentation? What are constituents of good documentation?

Code is documentation

First off, indeed, code is your most significant documentation. Code holds the ultimate truth about your software. All other ways of describing what code does are only approximations for those who

  • Don’t know the code (someone else wrote it)
  • Don’t have time to read the code (it’s too complex)
  • Don’t want to read the code (who wants to read Hibernate or Xerces code to understand what’s going on??)
  • Don’t have access to the code (although they could still decompile it)

For all others, code is documentation. So, obviously, code should be written in a way that documents its purpose. So don’t write clever code, write elegant code. Here’s a good example of how to not document “purpose” (except for the few Perl native speakers):

`$=`;$_=\%!;($_)=/(.)/;$==++$|;($.,$/,$,,$\,$",$;,$^,$#,$~,$*,$:,@%)=(
$!=~/(.)(.).(.)(.)(.)(.)..(.)(.)(.)..(.)......(.)/,$"),$=++;$.++;$.++;
$_++;$_++;($_,$\,$,)=($~.$"."$;$/$%[$?]$_$\$,$:$%[$?]",$"&$~,$#,);$,++
;$,++;$^|=$";`$_$\$,$/$:$;$~$*$%[$?]$.$~$*${#}$%[$?]$;$\$"$^$~$*.>&$=`

Taken from:
http://fwebde.com/programming/write-unreadable-code/

Apparently, this prints “Just another Perl hacker.”. I certainly won’t execute this on my machine, though. Don’t blame me for any loss of data ;-)

API is documentation

While API is still code, it is that part of the code that is exposed to most others. It should thus be:

  • Very simple
  • Very concise

Simplicity is king, of course. Conciseness, however, is not exactly the same thing. It can still be simple to use an API which isn’t concise. I’d consider using Spring’s J2eeBasedPreAuthenticatedWebAuthenticationDetailsSource simple. You configure it, you inject it, done. But the name hardly indicates conciseness.

This isn’t just about documentation, but about API design in general. It should be very easy to use your API, because then, your API clearly communicates its intent. And communicating one’s intent is documentation.

Good design (and thus documentation) rules to reach conciseness are these:

  • Don’t let methods with more than 3 arguments leak into your public API.
  • Don’t let methods / types with more than 3 words in their names leak into your public API.

Best avoid the above. If you cannot avoid such methods, keep things private. These methods are not reusable and thus not worth documenting in an API.

API should be documented in words

As soon as code “leaks” into the public API, it should be documented in human-readable words. True, java.util.List.add() is already quite concise. It clearly communicates its intent. But how does it behave and why? An extract from the Javadoc:

Lists that support this operation may place limitations on what elements may be added to this list. In particular, some lists will refuse to add null elements, and others will impose restrictions on the type of elements that may be added. List classes should clearly specify in their documentation any restrictions on what elements may be added.

So, there are some well-known lists, that “refuse to add null elements” there may be “restrictions on what elements may be added”. This can’t be understood from the API’s method signature only – unless you refuse to create a concise signature.

Tracking tools are documentation

Tracking tools are your human interface to your stakeholders. These help you discuss things and provide some historicised argumentation about why code is ultimately written the way it is. Keep things DRY, here. Recognise duplicates and try to keep only one simple and concise ticket per issue.

When modifying your code in a not-so-obvious way (because your stakeholders have not-so-obvious requirements), add a short comment to the relevant code section, referencing the tracking ID:

// [#1296] FOR UPDATE is simulated in some dialects
// using ResultSet.CONCUR_UPDATABLE
if (forUpdate && 
    !asList(CUBRID, SQLSERVER).contains(context.getDialect())) {

Yes, the code itself already explains that the subsequent section is executed only in forUpdate queries and only for the CUBRID and SQLSERVER dialects. But why? A future developer will gladly read up all they can find about issue #1296. If it is relevant, you should reference this ticket ID in:

  • Mailing lists
  • Source code
  • API documentation
  • Version control checkin comments
  • Stack Overflow questions
  • All sorts of other searchable documents
  • etc.

Version control is documentation

This part of the documentation is awesome! It documents change. In large projects, you may still be able to reconstruct why a co-worker who has long ago left the company did some weird change that you don’t understand right now. It is thus important to also include the aforementioned ticket ID in the change.

So, follow this rule: Is the change non-trivial (fixed spelling, fixed indentation, renamed local variable, etc.)? Then create a ticket and document this change with a ticket ID in your commit. Creating and referencing that ticket costs you only 1 minute, but it’ll save a future coworker hours of investigation!

Version numbering is documentation

A simple and concise version numbering system will help your users understand, which version they should upgrade to. A good example of how to do this correctly is semantic versioning. The golden rules here are to use an [X].[Y].[Z] versioning scheme that can be summarised as follows:

  • If a patch release includes bugfixes, performance improvements and API-irrelevant new features, [Z] is incremented by one.
  • If a minor release includes backwards-compatible, API-relevant new features, [Y] is incremented by one and [Z] is reset to zero.
  • If a major release includes backwards-incompatible, API-relevant new features, [X] is incremented by one and [Y], [Z] are reset to zero.

Follow these rules strictly, to communicate the change scope between your released versions.

Where things go wrong

Now here’s where it starts getting emotional…

Forget UML for documentation!

Don’t manually do big UML diagrams. Well, do them. They might help you understand / explain things to others. Create ad-hoc UML diagrams for a meeting, or informal UML diagrams for a high-level tutorial. Generate UML diagrams from relevant parts of your code (or entity diagrams from your database), but don’t consider them as a central part of your code documentation. No one will ever manually update UML diagrams with 100s of classes and 1000s of relations in them.

An exception to this rule may be UML-based model-driven architectures, where the UML is really part of the code, not the documentation.

Forget MS Word or HTML for documentation (if you can)!

Keep your documentation close to the code. It is almost impossible without an extreme amount of discipline, to keep external documentation in-sync with the actual code and/or API. If you can, auto-generate external documentation from the one in your code, to keep things DRY. But if you can avoid it, don’t write up external documentation. It’s hardly ever accurate.

Of course, you can’t always avoid external documentation. Sometimes, you need to write manuals, tutorials, how-tos, best practices, etc. Just beware that those documents are almost impossible to keep in-sync with the “real truth”: Your code.

Forget writing documentation early!

Your API will evolve. Hardly anyone writes APIs that last forever, like the Java APIs. So don’t spend all that time thinking about how to eternally link class A with type B and algorithm C. Write code, document those parts of the code that leak into the API, reference ticket IDs from your code / commits

Forget documenting boilerplate code!

Getters and setters, for instance. They usually don’t do more than getting and setting. If they don’t, don’t document it, because boring documentation gets stale and thus wrong. How many times have you refactored a property (and thus the getter/setter name), but not the Javadoc? Exactly. No one updates boilerplate API documentation.

/**
 * Returns the id
 *
 * @return The id
 */
public int getId() {
    return id;
}

Aaah, the ID! Surprise surprise.

Forget documenting trivial code!

Don’t do this:

// Check if we still have work
if (!jobs.isEmpty()) {

    // Get the next job for execution
    Job job = jobs.pollFirst();

    // ... and execute it
    job.execute();
}

Duh. That code is already simple and concise, as we’ve seen before. It needs no comments at all:

if (!jobs.isEmpty()) {
    Job job = jobs.pollFirst();
    job.execute();
}

TL;DR: Keep things simple and concise

Create good documentation:

  • by keeping documentation simple and concise.
  • by keeping documentation close to the code and close to the API, which are the ultimate truths of your application.
  • by keeping your documentation DRY.
  • by making documentation available to others, through a ticketing system, version control, semantic versioning.
  • by referencing ticket IDs throughout your available media.
  • by forgetting about “external” documentation, as long as you can.

Applications, APIs, libraries that provide you with good documentation will help you create better software, because well-documented applications, APIs, libraries are better software, themselves. Critically check your stack and try to avoid those parts that are not well-documented.

Bloated JavaBeans™, Part II – or Don’t Add “Getters” to Your API

I have recently blogged about an idea how JavaBeans™ could be extended to reduce the bloat created by this widely-accepted convention in the Java world.

That article was reblogged on DZone and got quite controversial feedback here (like most ideas that try to get some fresh ideas into the Java world):
http://java.dzone.com/articles/javabeans™-should-be-extended.

I want to revisit one of the thoughts I had in that article, which was given a bit less attention, namely:

Getter and Setter naming

Why do I have to use those bloated “get”/”is” and “set” prefixes every time I want to manipulate object properties? Besides, the case of the first letter of the property changes, too. If you want to perform a case-sensitive search on all usage of a property, you will have to write quite a regular expression to do so

I’m specifically having trouble understanding why we should use getters all over the place. Getters / setters are a convention to provide abstraction over property access. I.e., you’d typically write silly things like this, all the time:

public class MyBean {
    private int myProperty;

    public int getMyProperty() {
        return myProperty;
    }

    public void setMyProperty(int myProperty) {
        this.myProperty = myProperty;
    }
}

OK. Let’s accept that this appears to be our everyday life as a Java developer, writing all this bloat, instead of using standard keywords or annotations. I’m talking about standards, not proprietary things like Project Lombok. With facts of life accepted, let’s have a look at java.io.File for more details. To me, this is a good example where JavaBean-o-mania™ went quite wrong. Why? Check out this source code extract:

public class File {

    // This is the only relevant internal property. It would be "final"
    // if it wasn't set by serialisation magic in readObject()
    private String path;

    // Here are some arbitrary actions that you can perform on this file.
    // Usually, verbs are used as method names for actions. Good:
    public boolean delete();
    public void deleteOnExit();
    public boolean mkdir();
    public boolean renameTo(File dest);

    // Now the fun starts!
    // Here is the obvious "getter" as understood by JavaBeans™
    public String getPath();

    // Here are some additional "getters" that perform some transformation
    // on the underlying property, before returning it
    public String getName();
    public String getParent();
    public File getParentFile();
    public String getPath();

    // But some of these "transformation-getters" use "to", rather than
    // "get". Why "toPath()" but not "toParentFile()"? How to distinguish
    // "toPath()" and "getPath()"?
    public Path toPath();
    public URI toURI();

    // Here are some "getters" that aren't really getters, but retrieve
    // their information from the underlying file
    public long getFreeSpace();
    public long getTotalSpace();
    public long getUsableSpace();

    // But some of the methods qualifying as "not-really-getters" do not
    // feature the "get" action keyword, duh...
    public long lastModified();
    public long length();

    // Now, here's something. "Setters" that don't set properties, but
    // modify the underlying file. A.k.a. "not-really-setters"
    public boolean setLastModified(long time);
    public boolean setReadable(boolean readable);
    public boolean setWritable(boolean writable);

    // Note, of course, that it gets more confusing when you look at what
    // seem to be the "not-really-getters" for the above
    public long lastModified();
    public boolean canRead();
    public boolean canWrite();
}

Confused? Yes. But we have all ended up doing things this way, one time or another. jOOQ is no different, although, future versions will fix this.

How to improve things

Not all libraries and APIs are flawed in this way. Java has come a long way and has been written by many people with different views on the subject. Besides, Java is extremely backwards-compatible, so I don’t think that the JDK, if written from scratch, would still suffer from “JavaBean-o-mania™” as much. So here are a couple of rules that could be followed in new APIs, to get things a bit cleaned up:

  1. First off, decide whether your API will mainly be used in a Spring-heavy or JSP/JSF-heavy environment or any other environment that uses JavaBeans™-based expression languages, where you actually WANT to follow the standard convention. In that case, however, STRICTLY follow the convention and don’t name any information-retrieval method like this: “File.length()”. If you’re follwoing this paradigm, ALL of your methods should start with a verb, never with a noun / adjective
  2. The above applies to few libraries, hence you should probably NEVER use “get” if you want to access an object which is not a property. Just use the property name (noun, adjective). This will look much leaner at the call-site, specifically if your library is used in languages like Scala. In that way, “File.length()” was a good choice, just as “Enum.values()”, rather than “File.getLength()” or “Enum.getValues()”.
  3. You should probably ALSO NOT use “get” / “set” if you want to access properties. Java can easily separate the namespace for property / method names. Just use the property name itself in the getter / setter, like this:
    public class MyBean {
        private int myProperty;
    
        public int myProperty() {
            return myProperty;
        }
    
        public void myProperty(int myProperty) {
            this.myProperty = myProperty;
        }
    }
    

    Think about the first rule again, though. If you want to configure your bean with Spring, you may have no choice. But if you don’t need Spring, the above will have these advantages:

    • Your getters, setters, and properties have exactly the same name (and case of the initial letter). Text-searching across the codebase is much easier
    • The getter looks just like the property itself in languages like Scala, where these are equivalent expressions thanks to language syntax sugar: “myBean.myProperty()” and “myBean.myProperty”
    • Getter and setter are right next to each other in lexicographic ordering (e.g. in your IDE’s Outline view). This makes sense as the property itself is more interesting than the non-action of “getting” and “setting”
    • You never have to worry about whether to choose “get” or “is”. Besides, there are a couple of properties, where “get” / “is” are inappropriate anyway, e.g. whenever “has” is involved -> “getHasChildren()” or “isHasChildren()”? Meh, name it “hasChildren()” !! “setHasChildren(true)” ? No, “hasChildren(true)” !!
    • You can follow simple naming rules: Use verbs in imperative form to perform actions. Use nouns, adjectives or verbs in third person form to access objects / properties. This rule already proves that the standard convention is flawed. “get” is an imperative form, whereas “is” is a third person form.
  4. Consider returning “this” in the setter. Some people just like method-chaining:
        public MyBean myProperty(int myProperty) {
            this.myProperty = myProperty;
            return this;
        }
    
        // The above allows for things like
        myBean.myProperty(1).myOtherProperty(2).andThen(3);
    

    Alternatively, return the previous value, e.g:

        public int myProperty(int myProperty) {
            try {
                return this.myProperty;
            }
            finally {
                this.myProperty = myProperty;
            }
        }
    

    Make up your mind and choose one of the above, keeping things consistent across your API. In most cases, method chaining is less useful than an actual result value.

    Anyway, having “void” as return type is a waste of API scope. Specifically, consider Java 8’s lambda syntax for methods with / without return value (taken from Brian Goetz’s state of the lambda presentations):

    // Aaaah, Callables without curly braces nor semi-colons
    blocks.filter(b -> b.getColor() == BLUE);
    
    // Yuck! Blocks with curly braces and an extra semi-colon!
    blocks.forEach(b -> { b.setColor(RED); });
    
    // In other words, following the above rules, you probably
    // prefer to write:
    blocks.filter(b -> b.color() == BLUE)
          .forEach(b -> b.color(RED));
    

    Thinking about this now might be a decisive advantage of your API over your competition once Java 8 goes live (for those of us who maintain a public API).

  5. Finally, DO use “get” and “set” where you really want to emphasise the semantic of the ACTIONS called “getting” and “setting”. This includes getting and setting objects on types like:
    • Lists
    • Maps
    • References
    • ThreadLocals
    • Futures
    • etc…

    In all of these cases, “getting” and “setting” are actions, not property access. This is why you should use a verb like “get”, “set”, “put” and many others.

Summary

Be creative when designing an API. Don’t strictly follow the boring rules imposed by JavaBeans™ and Spring upon an entire industry. More recent JDK APIs and also well-known APIs by Google / Apache make little use of “get” and “set” when accessing objects / properties. Java is a static, type-safe language. Expression languages and configuration by injection are an exception in our every day work. Hence we should optimise our API for those use cases that we deal with the most. Better, if Spring adapt their ways of thinking to nice, lean, beautiful and fun APIs rather than forcing the Java world to bloat their APIs with boring stuff like getters and setters!

The good API design

I’ve stumbled upon a nice checklist wrapping up API design guidelines. An extract:

  1. Favor placing API and implementation into separate packages
  2. Favor placing APIs into high-level packages and implementation into lower-level packages
  3. Consider breaking up large APIs into several packages
  4. Consider putting API and implementation packages into separate Java archives
  5. Avoid (minimize) internal dependencies on implementation classes in APIs
  6. Avoid unnecessary API fragmentation
  7. Do not place public implementation classes in the API package
  8. Do not create dependencies between callers and implementation classes
  9. Do not place unrelated APIs into the same package
  10. Do not place API and SPI into the same package
  11. Do not move or rename the package of an already released public API

See the full checklist here:

http://theamiableapi.com/2012/01/16/java-api-design-checklist/