The Golden Rules of Code Documentation

Here’s another topic that is highly subjective, that leads to heated discussions, to religious wars and yet, there’s no objective right or wrong. A previous post on my blog was reblogged to my blogging partner JavaCodeGeeks. The amount of polarised ranting this blog provoked on JCG is hilarious. Specifically, I like the fact that people tend to claim dogmatic things like:
If you need comments to clarify code, better think how to write code differently, so it is more understandable. You do not need yet another language (comments) to mess with the primary language (code).
Quite obviously, this person has written 1-2 “Hello world” applications, where this obviously holds true. My answer to that was:
How would you write this business logic down into code, such that you can live without comments? A stock exchange order of clearing type code 27 needs to be grouped with all other subsequent orders of type code 27 (if and only if they have a rounding lot below 0.01), before actually unloading them within a time-frame of at most 35 seconds (fictional example in a real-life application).
Sure. Code can communicate “what” it does. But only comments can communicate “why” it does it! “why” is a broader truth that simply cannot be expressed in code. It involves requirements, feelings, experience, etc. etc. So it’s time for me to write up another polarising blog post leading to (hopefully!) more heated discussions! It is about:

The Golden Rules of Code Documentation

Good documentation adds readability, transparency, stability, and trustworthiness to your application and/or API. But what is good documentation? What are constituents of good documentation? Code is documentation First off, indeed, code is your most significant documentation. Code holds the ultimate truth about your software. All other ways of describing what code does are only approximations for those who
  • Don’t know the code (someone else wrote it)
  • Don’t have time to read the code (it’s too complex)
  • Don’t want to read the code (who wants to read Hibernate or Xerces code to understand what’s going on??)
  • Don’t have access to the code (although they could still decompile it)
For all others, code is documentation. So, obviously, code should be written in a way that documents its purpose. So don’t write clever code, write elegant code. Here’s a good example of how to not document “purpose” (except for the few Perl native speakers):

`$=`;$_=\%!;($_)=/(.)/;$==++$|;($.,$/,$,,$\,$",$;,$^,$#,$~,$*,$:,@%)=(
$!=~/(.)(.).(.)(.)(.)(.)..(.)(.)(.)..(.)......(.)/,$"),$=++;$.++;$.++;
$_++;$_++;($_,$\,$,)=($~.$"."$;$/$%[$?]$_$\$,$:$%[$?]",$"&$~,$#,);$,++
;$,++;$^|=$";`$_$\$,$/$:$;$~$*$%[$?]$.$~$*${#}$%[$?]$;$\$"$^$~$*.>&$=`

Taken from: http://fwebde.com/programming/write-unreadable-code/ Apparently, this prints “Just another Perl hacker.”. I certainly won’t execute this on my machine, though. Don’t blame me for any loss of data ;-) API is documentation While API is still code, it is that part of the code that is exposed to most others. It should thus be:
  • Very simple
  • Very concise
Simplicity is king, of course. Conciseness, however, is not exactly the same thing. It can still be simple to use an API which isn’t concise. I’d consider using Spring’s J2eeBasedPreAuthenticatedWebAuthenticationDetailsSource simple. You configure it, you inject it, done. But the name hardly indicates conciseness. This isn’t just about documentation, but about API design in general. It should be very easy to use your API, because then, your API clearly communicates its intent. And communicating one’s intent is documentation. Good design (and thus documentation) rules to reach conciseness are these:
  • Don’t let methods with more than 3 arguments leak into your public API.
  • Don’t let methods / types with more than 3 words in their names leak into your public API.
Best avoid the above. If you cannot avoid such methods, keep things private. These methods are not reusable and thus not worth documenting in an API. API should be documented in words As soon as code “leaks” into the public API, it should be documented in human-readable words. True, java.util.List.add() is already quite concise. It clearly communicates its intent. But how does it behave and why? An extract from the Javadoc:
Lists that support this operation may place limitations on what elements may be added to this list. In particular, some lists will refuse to add null elements, and others will impose restrictions on the type of elements that may be added. List classes should clearly specify in their documentation any restrictions on what elements may be added.
So, there are some well-known lists, that “refuse to add null elements” there may be “restrictions on what elements may be added”. This can’t be understood from the API’s method signature only – unless you refuse to create a concise signature. Tracking tools are documentation Tracking tools are your human interface to your stakeholders. These help you discuss things and provide some historicised argumentation about why code is ultimately written the way it is. Keep things DRY, here. Recognise duplicates and try to keep only one simple and concise ticket per issue. When modifying your code in a not-so-obvious way (because your stakeholders have not-so-obvious requirements), add a short comment to the relevant code section, referencing the tracking ID:

// [#1296] FOR UPDATE is simulated in some dialects
// using ResultSet.CONCUR_UPDATABLE
if (forUpdate && 
    !asList(CUBRID, SQLSERVER).contains(context.getDialect())) {

Yes, the code itself already explains that the subsequent section is executed only in forUpdate queries and only for the CUBRID and SQLSERVER dialects. But why? A future developer will gladly read up all they can find about issue #1296. If it is relevant, you should reference this ticket ID in:
  • Mailing lists
  • Source code
  • API documentation
  • Version control checkin comments
  • Stack Overflow questions
  • All sorts of other searchable documents
  • etc.
Version control is documentation This part of the documentation is awesome! It documents change. In large projects, you may still be able to reconstruct why a co-worker who has long ago left the company did some weird change that you don’t understand right now. It is thus important to also include the aforementioned ticket ID in the change. So, follow this rule: Is the change non-trivial (fixed spelling, fixed indentation, renamed local variable, etc.)? Then create a ticket and document this change with a ticket ID in your commit. Creating and referencing that ticket costs you only 1 minute, but it’ll save a future coworker hours of investigation! Version numbering is documentation A simple and concise version numbering system will help your users understand, which version they should upgrade to. A good example of how to do this correctly is semantic versioning. The golden rules here are to use an [X].[Y].[Z] versioning scheme that can be summarised as follows:
  • If a patch release includes bugfixes, performance improvements and API-irrelevant new features, [Z] is incremented by one.
  • If a minor release includes backwards-compatible, API-relevant new features, [Y] is incremented by one and [Z] is reset to zero.
  • If a major release includes backwards-incompatible, API-relevant new features, [X] is incremented by one and [Y], [Z] are reset to zero.
Follow these rules strictly, to communicate the change scope between your released versions.

Where things go wrong

Now here’s where it starts getting emotional… Forget UML for documentation! Don’t manually do big UML diagrams. Well, do them. They might help you understand / explain things to others. Create ad-hoc UML diagrams for a meeting, or informal UML diagrams for a high-level tutorial. Generate UML diagrams from relevant parts of your code (or entity diagrams from your database), but don’t consider them as a central part of your code documentation. No one will ever manually update UML diagrams with 100s of classes and 1000s of relations in them. An exception to this rule may be UML-based model-driven architectures, where the UML is really part of the code, not the documentation. Forget MS Word or HTML for documentation (if you can)! Keep your documentation close to the code. It is almost impossible without an extreme amount of discipline, to keep external documentation in-sync with the actual code and/or API. If you can, auto-generate external documentation from the one in your code, to keep things DRY. But if you can avoid it, don’t write up external documentation. It’s hardly ever accurate. Of course, you can’t always avoid external documentation. Sometimes, you need to write manuals, tutorials, how-tos, best practices, etc. Just beware that those documents are almost impossible to keep in-sync with the “real truth”: Your code. Forget writing documentation early! Your API will evolve. Hardly anyone writes APIs that last forever, like the Java APIs. So don’t spend all that time thinking about how to eternally link class A with type B and algorithm C. Write code, document those parts of the code that leak into the API, reference ticket IDs from your code / commits Forget documenting boilerplate code! Getters and setters, for instance. They usually don’t do more than getting and setting. If they don’t, don’t document it, because boring documentation gets stale and thus wrong. How many times have you refactored a property (and thus the getter/setter name), but not the Javadoc? Exactly. No one updates boilerplate API documentation.

/**
 * Returns the id
 *
 * @return The id
 */
public int getId() {
    return id;
}

Aaah, the ID! Surprise surprise. Forget documenting trivial code! Don’t do this:

// Check if we still have work
if (!jobs.isEmpty()) {

    // Get the next job for execution
    Job job = jobs.pollFirst();

    // ... and execute it
    job.execute();
}

Duh. That code is already simple and concise, as we’ve seen before. It needs no comments at all:

if (!jobs.isEmpty()) {
    Job job = jobs.pollFirst();
    job.execute();
}

TL;DR: Keep things simple and concise

Create good documentation:
  • by keeping documentation simple and concise.
  • by keeping documentation close to the code and close to the API, which are the ultimate truths of your application.
  • by keeping your documentation DRY.
  • by making documentation available to others, through a ticketing system, version control, semantic versioning.
  • by referencing ticket IDs throughout your available media.
  • by forgetting about “external” documentation, as long as you can.
Applications, APIs, libraries that provide you with good documentation will help you create better software, because well-documented applications, APIs, libraries are better software, themselves. Critically check your stack and try to avoid those parts that are not well-documented.

18 thoughts on “The Golden Rules of Code Documentation

  1. Golden Rules indeed! I will keep this for reference and use this accordingly.

    What I’ve made good experiences with is having a Wiki which through its usage documents the process a team of developers went through in arriving at certain design decisions.

    For communication with stakeholders, an explicit glossary which defines the system’s understanding of certain basic concepts is often helpful.

    1. Yes, wikis are good for overviews and glossaries, as overviews and glossaries are not prone to frequent change. Similar to tutorials and high-level manuals, which we sometimes have to produce. As long as there is not too much content in those wikis…

      At a previous employer, 50% of the wiki pages were obsolete rubbish. Not a good incentive to add / maintain another wiki page

  2. Nice article. For me, there are three reasons to write code documentation:

    1. API details, like Javadocs. This includes what can be null, is a resource closed etc. Things you would otherwise have to look at the source to verify.
    2. Explain why we do something. Code says what, comments says why.
    3. Add a note to dodgy looking code explaining that this is actually what we want to do. If the code looks like it may be broken, but in fact it is not, write a comment explaining this. When you read it later, it looking broken will jump out at you and you will waste time checking to see whether or not it is really broken.
  3. I personally believe that you should avoid inline comments in code. In some cases they may be OK, but generally they are smell of bad code. If you need to comment it to make it understandable probably the code is not elegant.

    Also inline comments do not follow code change and may be out of date. I wasted once a whole day over my own code where it was like:

    // this is false not to terminate a thread in a wrong way
    if( true ){
    ….

    You read the comment and your eyes will slip over the actual line.

    I myself are very keen on documentation on the other hand. Documentation is a must and has to be supported by tools to be ready and up to date as possible. I like to create documentation using markdown and for a long time I was distracted from using it because if does not support doxia macros to include code snippets into the documentation. That way you can have sample codes from the unit tests that are also serve documentary purpose. This has two advantages:

    1. Your samples in the documentation are not out of date
    2. You get the habit to create readable unit test code

    Recently I realized that I can use markdown together with velocity in maven to achieve the same result:

    https://github.com/verhas/mdsnippet

    Tracking tools are documentation but I believe that putting reference into the code to ticket in form of comments is antipattern. Use a good tracking software that does that for you the other way around. If you do not have one installed, using the comments, like //TODO and //REVIEWED BY and their companies may be a workaround that we can live with. We shoudl in that case know that this is not the ultimate solution only we do not have any better atm.

    1. Thanks for all this feedback! We don’t have to agree on everything, of course ;-) But your mentioning unit tests is an important point, specifically because they never run out of date

  4. Very interesting! I would like to share with you my “rules” regarding putting comments in unit tests code (this is an excerpt from my book – Practical Unit Testing):
    “Personally, my test code is almost free of comments. There are only three situations where I (sometimes) put comments in code. First, when adding a test because of a reported bug. In such a case, I usually add a link to the issue on bug tracker, and (optionally) write a short explanation of what the problem was.
    Sometime, I also add a comment on the reasons for selecting a particular set of test data. This often has something to do with business requirements, and cannot be read from the code. It also happens that some tests of legacy code which use non-standard techniques (like the ones described in Section 7.6) require a comment. Apart from these rare occasions, I do not comment on my test code at all.”

    Recently, after few months of writing many integration and end-to-end tests, I realized I put much more comments there than I do for unit tests. These tests tend to be quite long, and even though I try to write them so their business purpose (scenario) is clear, this is not always simple to achieve. So I believe commenting on what is happening there is valuable.

    1. Thanks for sharing. Yes, I guess we agree more or less on when to put comments, and when the code should be sufficiently self-explanatory. I like your distinguishing unit and integration tests, that makes sense. There is a lot more explaining “why” things are done in integration tests, as opposed to the pre-dominant “what” is being done in unit tests.

  5. Nice post

    My personal rules about comments are:

    – Better no comment then a bad comment
    After 2 comments without added value, you stop reading them all.

    – A comment must have an added value. I.e., it must bring me something the code cannot. e.g., the why you talk about :)

    1. Thanks for sharing. Yep, that wraps it up quite nicely. As stated in the article, the biggest and easiest added value by any comment is usually a reference to an external issue tracker.

  6. Nice article

    I agree with the point, better not to put comment then a bad comment.

    Good Comment and a clean code make code look elegant.

Leave a Reply