How to Design a Good, Regular API

People have strong opinions on how to design a good API. Consequently, there are lots of pages and books in the web, explaining how to do it. This article will focus on a particular aspect of good APIs: Regularity. Regularity is what happens when you follow the “Principle of Least Astonishment“. This principle holds true no matter what kinds of personal taste and style you would like to put into your API, otherwise. It is thus one of the most important features of a good API.

The following are a couple of things to keep in mind when designing a “regular” API:

Rule #1: Establish strong terms

If your API grows, there will be repetitive use of the same terms, over and over again. For instance, some actions will be come in several flavours resulting in various classes / types / methods, that differ only subtly in behaviour. The fact that they’re similar should be reflected by their names. Names should use strong terms. Take JDBC for instance. No matter how you execute a Statement, you will always use the term execute to do it. For instance, you will call any of these methods:

In a similar fashion, you will always use the term close to release resources, no matter which resource you’re releasing. For instance, you will call:

As a matter of fact, close is such a strong and established term in the JDK, that it has lead to the interfaces java.io.Closeable (since Java 1.5), and java.lang.AutoCloseable (since Java 1.7), which generally establish a contract of releasing resources.

Rule violation: Observable

This rule is violated a couple of times in the JDK. For instance, in the java.util.Observable class. While other “Collection-like” types established the terms

  • size()
  • remove()
  • removeAll()

… this class declares

There is no good reason for using other terms in this context. The same applies to Observer.update(), which should really be called notify(), an otherwise established term in JDK APIs

Rule violation: Spring. Most of it

Spring has really gotten popular in the days when J2EE was weird, slow, and cumbersome. Think about EJB 2.0… There may be similar opinions on Spring out there, which are off-topic for this post. Here’s how Spring violates this concrete rule. A couple of random examples where Spring fails to establish strong terms, and uses long concatenations of meaningless, inconcise words instead:

Apart from “feeling” like a horrible API (to me), here’s some more objective analysis:

  • What’s the difference between a Creator and a Factory
  • What’s the difference between a Source and a Provider?
  • What’s the non-subtle difference between an Advisor and a Provider?
  • What’s the non-subtle difference between a Discoverer and a Provider?
  • Is an Advisor related to an AspectJAdvice?
  • Is it a ScanningCandidate or a CandidateComponent?
  • What’s a TargetSource? And how would it be different from a SourceTarget if not a SourceSource or my favourite: A SourceSourceTargetProviderSource?

Gary Fleming commented on my previous blog post about Spring’s funny class names:

I’d be willing to bet that a Markov-chain generated class name (based on Spring Security) would be indistinguishable from the real thing.

Back to more seriousness…

Rule #2: Apply symmetry to term combinations

Once you’ve established strong terms, you will start combining them. When you look at the JDK’s Collection APIs, you will notice the fact that they are symmetric in a way that they’ve established the terms add(), remove(), contains(), and all, before combining them symmetrically:

Now, the Collection type is a good example where an exception to this rule may be acceptable, when a method doesn’t “pull its own weight”. This is probably the case for retainAll(Collection<?>), which doesn’t have an equivalent retain(E) method. It might just as well be a regular violation of this rule, though.

Rule violation: Map

This rule is violated all the time, mostly because of some methods not pulling their own weight (which is ultimately a matter of taste). With Java 8’s defender methods, there will no longer be any excuse of not adding default implementations for useful utility methods that should’ve been on some types. For instance: Map. It violates this rule a couple of times:

Observe also, that there is no point of using the term Set in the method names. The method signature already indicates that the result has a Set type. It would’ve been more consistent and symmetric if those methods would’ve been named keys(), values(), entries(). (On a side-note, Sets and Lists are another topic that I will soon blog about, as I think those types do not pull their own weight either)

At the same time, the Map interface violates this rule by providing

Besides, establishing the term clear() instead of reusing removeAll() with no arguments is unnecessary. This applies to all Collection API members. In fact, the clear() method also violates rule #1. It is not immediately obvious, if clear does anything subtly different from remove when removing collection elements.

Rule #3: Add convenience through overloading

There is mostly only one compelling reason, why you would want to overload a method: Convenience. Often you want to do precisely the same thing in different contexts, but constructing that very specific method argument type is cumbersome. So, for convenience, you offer your API users another variant of the same method, with a “friendlier” argument type set. This can be observed again in the Collection type. We have:

Another example is the Arrays utility class. We have:

Overloading is mostly used for two reasons:

  1. Providing “default” argument behaviour, as in Collection.toArray()
  2. Supporting several incompatible, yet “similar” argument sets, as in Arrays.copyOf()

Other languages have incorporated these concepts into their language syntax. Many languages (e.g. PL/SQL) formally support named default arguments. Some languages (e.g. JavaScript) don’t even care how many arguments there really are. And another, new JVM language called Ceylon got rid of overloading by combining the support for named, default arguments with union types. As Ceylon is a statically typed language, this is probable the most powerful approach of adding convenience to your API.

Rule violation: TreeSet

It is hard to find a good example of a case where this rule is violated in the JDK. But there is one: the TreeSet and TreeMap. Their constructors are overloaded several times. Let’s have a look at these two constructors:

The latter “cleverly” adds some convenience to the first in that it extracts a well-known Comparator from the argument SortedSet to preserve ordering. This behaviour is quite different from the compatible (!) first constructor, which doesn’t do an instanceof check of the argument collection. I.e. these two constructor calls result in different behaviour:

SortedSet<Object> original = // [...]

// Preserves ordering:
new TreeSet<Object>(original);

// Resets ordering:
new TreeSet<Object>((Collection<Object>) original);

These constructors violate the rule in that they produce completely different behaviour. They’re not just mere convenience.

Rule #4: Consistent argument ordering

Be sure that you consistently order arguments of your methods. This is an obvious thing to do for overloaded methods, as you can immediately see how it is better to always put the array first and the int after in the previous example from the Arrays utility class:

But you will quickly notice that all methods in that class will put the array being operated on first. Some examples:

Rule violation: Arrays

The same class also “subtly” violates this rule in that it puts optional arguments in between other arguments, when overloading methods. For instance, it declares

When the latter should’ve been fill(Object[], Object, int, int). This is a “subtle” rule violation, as you may also argue that those methods in Arrays that restrict an argument array to a range will always put the array and the range argument together. In that way, the fill() method would again follow the rule as it provides the same argument order as copyOfRange(), for instance:

You will never be able to escape this problem if you heavily overload your API. Unfortunately, Java doesn’t support named parameters, which helps formally distinguishing arguments in a large argument list, as sometimes, large argument lists cannot be avoided.

Rule violation: String

Another case of a rule violation is the String class:

The problems here are:

  • It is hard to immediately understand the difference between the two methods, as the optional boolean argument is inserted at the beginning of the argument list
  • It is hard to immediately understand the purpose of every int argument, as there are many arguments in a single method

Rule #5: Establish return value types

This may be a bit controversial as people may have different views on this topic. No matter what your opinion is, however, you should create a consistent, regular API when it comes to defining return value types. An example rule set (on which you may disagree):

  • Methods returning a single object should return null when no object was found
  • Methods returning several objects should return an empty List, Set, Map, array, etc. when no object was found (never null)
  • Methods should only throw exceptions in case of an … well, an exception

With such a rule set, it is not a good practice to have 1-2 methods lying around, which:

  • … throw ObjectNotFoundExceptions when no object was found
  • … return null instead of empty Lists

Rule violation: File

File is an example of a JDK class that violates many rules. Among them, the rule of regular return types. Its File.list() Javadoc reads:

An array of strings naming the files and directories in the directory denoted by this abstract pathname. The array will be empty if the directory is empty. Returns null if this abstract pathname does not denote a directory, or if an I/O error occurs.

So, the correct way to iterate over file names (if you’re doing defensive programming) is:

String[] files = file.list();

// You should never forget this null check!
if (files != null) {
    for (String file : files) {
        // Do things with your file
    }
}

Of course, we could argue that the Java 5 expert group could’ve been nice with us and worked that null check into their implementation of the foreach loop. Similar to the missing null check when switching over an enum (which should lead to the default: case). They’ve probably preferred the “fail early” approach in this case.

The point here is that File already has sufficient means of checking if file is really a directory (File.isDirectory()). And it should throw an IOException if something went wrong, instead of returning null. This is a very strong violation of this rule, causing lots of pain at the call-site… Hence:

NEVER return null when returning arrays or collections!

Rule violation: JPA

An example of how JPA violates this rule is the way how entities are retrieved from the EntityManager or from a Query:

As NoResultException is a RuntimeException this flaw heavily violates the Principle of Least Astonishment, as you might stay unaware of this difference until runtime!

IF you insist on throwing NoResultExceptions, make them checked exceptions as client code MUST handle them

Conclusion and further reading

… or rather, further watching. Have a look at Josh Bloch’s presentation on API design. He agrees with most of my claims, around 0:30:30

Another useful example of such a web page is the “Java API Design Checklist” by The Amiable API:
Java API Design Checklist

The Golden Rules of Code Documentation

Here’s another topic that is highly subjective, that leads to heated discussions, to religious wars and yet, there’s no objective right or wrong.

A previous post on my blog was reblogged to my blogging partner JavaCodeGeeks. The amount of polarised ranting this blog provoked on JCG is hilarious. Specifically, I like the fact that people tend to claim dogmatic things like:

If you need comments to clarify code, better think how to write code differently, so it is more understandable. You do not need yet another language (comments) to mess with the primary language (code).

Quite obviously, this person has written 1-2 “Hello world” applications, where this obviously holds true. My answer to that was:

How would you write this business logic down into code, such that you can live without comments?

A stock exchange order of clearing type code 27 needs to be grouped with all other subsequent orders of type code 27 (if and only if they have a rounding lot below 0.01), before actually unloading them within a time-frame of at most 35 seconds (fictional example in a real-life application).

Sure. Code can communicate “what” it does. But only comments can communicate “why” it does it! “why” is a broader truth that simply cannot be expressed in code. It involves requirements, feelings, experience, etc. etc.

So it’s time for me to write up another polarising blog post leading to (hopefully!) more heated discussions! It is about:

The Golden Rules of Code Documentation

Good documentation adds readability, transparency, stability, and trustworthiness to your application and/or API. But what is good documentation? What are constituents of good documentation?

Code is documentation

First off, indeed, code is your most significant documentation. Code holds the ultimate truth about your software. All other ways of describing what code does are only approximations for those who

  • Don’t know the code (someone else wrote it)
  • Don’t have time to read the code (it’s too complex)
  • Don’t want to read the code (who wants to read Hibernate or Xerces code to understand what’s going on??)
  • Don’t have access to the code (although they could still decompile it)

For all others, code is documentation. So, obviously, code should be written in a way that documents its purpose. So don’t write clever code, write elegant code. Here’s a good example of how to not document “purpose” (except for the few Perl native speakers):

`$=`;$_=\%!;($_)=/(.)/;$==++$|;($.,$/,$,,$\,$",$;,$^,$#,$~,$*,$:,@%)=(
$!=~/(.)(.).(.)(.)(.)(.)..(.)(.)(.)..(.)......(.)/,$"),$=++;$.++;$.++;
$_++;$_++;($_,$\,$,)=($~.$"."$;$/$%[$?]$_$\$,$:$%[$?]",$"&$~,$#,);$,++
;$,++;$^|=$";`$_$\$,$/$:$;$~$*$%[$?]$.$~$*${#}$%[$?]$;$\$"$^$~$*.>&$=`

Taken from:
http://fwebde.com/programming/write-unreadable-code/

Apparently, this prints “Just another Perl hacker.”. I certainly won’t execute this on my machine, though. Don’t blame me for any loss of data ;-)

API is documentation

While API is still code, it is that part of the code that is exposed to most others. It should thus be:

  • Very simple
  • Very concise

Simplicity is king, of course. Conciseness, however, is not exactly the same thing. It can still be simple to use an API which isn’t concise. I’d consider using Spring’s J2eeBasedPreAuthenticatedWebAuthenticationDetailsSource simple. You configure it, you inject it, done. But the name hardly indicates conciseness.

This isn’t just about documentation, but about API design in general. It should be very easy to use your API, because then, your API clearly communicates its intent. And communicating one’s intent is documentation.

Good design (and thus documentation) rules to reach conciseness are these:

  • Don’t let methods with more than 3 arguments leak into your public API.
  • Don’t let methods / types with more than 3 words in their names leak into your public API.

Best avoid the above. If you cannot avoid such methods, keep things private. These methods are not reusable and thus not worth documenting in an API.

API should be documented in words

As soon as code “leaks” into the public API, it should be documented in human-readable words. True, java.util.List.add() is already quite concise. It clearly communicates its intent. But how does it behave and why? An extract from the Javadoc:

Lists that support this operation may place limitations on what elements may be added to this list. In particular, some lists will refuse to add null elements, and others will impose restrictions on the type of elements that may be added. List classes should clearly specify in their documentation any restrictions on what elements may be added.

So, there are some well-known lists, that “refuse to add null elements” there may be “restrictions on what elements may be added”. This can’t be understood from the API’s method signature only – unless you refuse to create a concise signature.

Tracking tools are documentation

Tracking tools are your human interface to your stakeholders. These help you discuss things and provide some historicised argumentation about why code is ultimately written the way it is. Keep things DRY, here. Recognise duplicates and try to keep only one simple and concise ticket per issue.

When modifying your code in a not-so-obvious way (because your stakeholders have not-so-obvious requirements), add a short comment to the relevant code section, referencing the tracking ID:

// [#1296] FOR UPDATE is simulated in some dialects
// using ResultSet.CONCUR_UPDATABLE
if (forUpdate && 
    !asList(CUBRID, SQLSERVER).contains(context.getDialect())) {

Yes, the code itself already explains that the subsequent section is executed only in forUpdate queries and only for the CUBRID and SQLSERVER dialects. But why? A future developer will gladly read up all they can find about issue #1296. If it is relevant, you should reference this ticket ID in:

  • Mailing lists
  • Source code
  • API documentation
  • Version control checkin comments
  • Stack Overflow questions
  • All sorts of other searchable documents
  • etc.

Version control is documentation

This part of the documentation is awesome! It documents change. In large projects, you may still be able to reconstruct why a co-worker who has long ago left the company did some weird change that you don’t understand right now. It is thus important to also include the aforementioned ticket ID in the change.

So, follow this rule: Is the change non-trivial (fixed spelling, fixed indentation, renamed local variable, etc.)? Then create a ticket and document this change with a ticket ID in your commit. Creating and referencing that ticket costs you only 1 minute, but it’ll save a future coworker hours of investigation!

Version numbering is documentation

A simple and concise version numbering system will help your users understand, which version they should upgrade to. A good example of how to do this correctly is semantic versioning. The golden rules here are to use an [X].[Y].[Z] versioning scheme that can be summarised as follows:

  • If a patch release includes bugfixes, performance improvements and API-irrelevant new features, [Z] is incremented by one.
  • If a minor release includes backwards-compatible, API-relevant new features, [Y] is incremented by one and [Z] is reset to zero.
  • If a major release includes backwards-incompatible, API-relevant new features, [X] is incremented by one and [Y], [Z] are reset to zero.

Follow these rules strictly, to communicate the change scope between your released versions.

Where things go wrong

Now here’s where it starts getting emotional…

Forget UML for documentation!

Don’t manually do big UML diagrams. Well, do them. They might help you understand / explain things to others. Create ad-hoc UML diagrams for a meeting, or informal UML diagrams for a high-level tutorial. Generate UML diagrams from relevant parts of your code (or entity diagrams from your database), but don’t consider them as a central part of your code documentation. No one will ever manually update UML diagrams with 100s of classes and 1000s of relations in them.

An exception to this rule may be UML-based model-driven architectures, where the UML is really part of the code, not the documentation.

Forget MS Word or HTML for documentation (if you can)!

Keep your documentation close to the code. It is almost impossible without an extreme amount of discipline, to keep external documentation in-sync with the actual code and/or API. If you can, auto-generate external documentation from the one in your code, to keep things DRY. But if you can avoid it, don’t write up external documentation. It’s hardly ever accurate.

Of course, you can’t always avoid external documentation. Sometimes, you need to write manuals, tutorials, how-tos, best practices, etc. Just beware that those documents are almost impossible to keep in-sync with the “real truth”: Your code.

Forget writing documentation early!

Your API will evolve. Hardly anyone writes APIs that last forever, like the Java APIs. So don’t spend all that time thinking about how to eternally link class A with type B and algorithm C. Write code, document those parts of the code that leak into the API, reference ticket IDs from your code / commits

Forget documenting boilerplate code!

Getters and setters, for instance. They usually don’t do more than getting and setting. If they don’t, don’t document it, because boring documentation gets stale and thus wrong. How many times have you refactored a property (and thus the getter/setter name), but not the Javadoc? Exactly. No one updates boilerplate API documentation.

/**
 * Returns the id
 *
 * @return The id
 */
public int getId() {
    return id;
}

Aaah, the ID! Surprise surprise.

Forget documenting trivial code!

Don’t do this:

// Check if we still have work
if (!jobs.isEmpty()) {

    // Get the next job for execution
    Job job = jobs.pollFirst();

    // ... and execute it
    job.execute();
}

Duh. That code is already simple and concise, as we’ve seen before. It needs no comments at all:

if (!jobs.isEmpty()) {
    Job job = jobs.pollFirst();
    job.execute();
}

TL;DR: Keep things simple and concise

Create good documentation:

  • by keeping documentation simple and concise.
  • by keeping documentation close to the code and close to the API, which are the ultimate truths of your application.
  • by keeping your documentation DRY.
  • by making documentation available to others, through a ticketing system, version control, semantic versioning.
  • by referencing ticket IDs throughout your available media.
  • by forgetting about “external” documentation, as long as you can.

Applications, APIs, libraries that provide you with good documentation will help you create better software, because well-documented applications, APIs, libraries are better software, themselves. Critically check your stack and try to avoid those parts that are not well-documented.

Defensive API evolution with Java interfaces

API evolution is something absolutely non-trivial. Something that only few have to deal with. Most of us work on internal, proprietary APIs every day. Modern IDEs ship with awesome tooling to factor out, rename, pull up, push down, indirect, delegate, infer, generalise our code artefacts. These tools make refactoring our internal APIs a piece of cake.

But some of us work on public APIs, where the rules change drastically. Public APIs, if done properly, are versioned. Every change – compatible or incompatible – should be published in a new API version. Most people will agree that API evolution should be done in major and minor releases, similar to what is specified in semantic versioning. In short: Incompatible API changes are published in major releases (1.0, 2.0, 3.0), whereas compatible API changes / enhancements are published in minor releases (1.0, 1.1, 1.2).

If you’re planning ahead, you’re going to foresee most of your incompatible changes a long time before actually publishing the next major release. A good tool in Java to announce such a change early is deprecation.

Interface API evolution

Now, deprecation is a good tool to indicate that you’re about to remove a type or member from your API. What if you’re going to add a method, or a type to an interface’s type hierarchy? This means that all client code implementing your interface will break – at least as long as Java 8’s defender methods aren’t introduced yet. There are several techniques to circumvent / work around this problem:

1. Don’t care about it

Yes, that’s an option too. Your API is public, but maybe not so much used. Let’s face it: Not all of us work on the JDK / Eclipse / Apache / etc codebases.

If you’re friendly, you’re at least going to wait for a major release to introduce new methods. But you can break the rules of semantic versioning if you really have to – if you can deal with the consequences of getting a mob of angry users.

Note, though, that other platforms aren’t as backwards-compatible as the Java universe (often by language design, or by language complexity). E.g. with Scala’s various ways of declaring things as implicit, your API can’t always be perfect.

2. Do it the Java way

The “Java” way is not to evolve interfaces at all. Most API types in the JDK have been the way they are today forever. Of course, this makes APIs feel quite “dinosaury” and adds a lot of redundancy between various similar types, such as StringBuffer and StringBuilder, or Hashtable and HashMap.

Note that some parts of Java don’t adhere to the “Java” way. Most specifically, this is the case for the JDBC API, which evolves according to the rules of section #1: “Don’t care about it”.

3. Do it the Eclipse way

Eclipse’s internals contain huge APIs. There are a lot of guidelines how to evolve your own APIs (i.e. public parts of your plugin), when developing for / within Eclipse. One example about how the Eclipse guys extend interfaces is the IAnnotationHover type. By Javadoc contract, it allows implementations to also implement IAnnotationHoverExtension and IAnnotationHoverExtension2. Obviously, in the long run, such an evolved API is quite hard to maintain, test, and document, and ultimately, hard to use! (consider ICompletionProposal and its 6 (!) extension types)

4. Wait for Java 8

In Java 8, you will be able to make use of defender methods. This means that you can provide a sensible default implementation for your new interface methods as can be seen in Java 1.8’s java.util.Iterator (an extract):

public interface Iterator<E> {

    // These methods are kept the same:
    boolean hasNext();
    E next();

    // This method is now made "optional" (finally!)
    public default void remove() {
        throw new UnsupportedOperationException("remove");
    }

    // This method has been added compatibly in Java 1.8
    default void forEach(Consumer<? super E> consumer) {
        Objects.requireNonNull(consumer);
        while (hasNext())
            consumer.accept(next());
    }
}

Of course, you don’t always want to provide a default implementation. Often, your interface is a contract that has to be implemented entirely by client code.

5. Provide public default implementations

In many cases, it is wise to tell the client code that they may implement an interface at their own risk (due to API evolution), and they should better extend a supplied abstract or default implementation, instead. A good example for this is java.util.List, which can be a pain to implement correctly. For simple, not performance-critical custom lists, most users probably choose to extend java.util.AbstractList instead. The only methods left to implement are then get(int) and size(), The behaviour of all other methods can be derived from these two:

class EmptyList<E> extends AbstractList<E> {
    @Override
    public E get(int index) {
        throw new IndexOutOfBoundsException("No elements here");
    }

    @Override
    public int size() {
        return 0;
    }
}

A good convention to follow is to name your default implementation AbstractXXX if it is abstract, or DefaultXXX if it is concrete

6. Make your API very hard to implement

Now, this isn’t really a good technique, but just a probable fact. If your API is very hard to implement (you have 100s of methods in an interface), then users are probably not going to do it. Note: probably. Never underestimate the crazy user. An example of this is jOOQ’s org.jooq.Field type, which represents a database field / column. In fact, this type is part of jOOQ’s internal domain specific language, offering all sorts of operations and functions that can be performed upon a database column.

Of course, having so many methods is an exception and – if you’re not designing a DSL – is probably a sign of a bad overall design.

7. Add compiler and IDE tricks

Last but not least, there are some nifty tricks that you can apply to your API, to help people understand what they ought to do in order to correctly implement your interface-based API. Here’s a tough example, that slaps the API designer’s intention straight into your face. Consider this extract of the org.hamcrest.Matcher API:

public interface Matcher<T> extends SelfDescribing {

    // This is what a Matcher really does.
    boolean matches(Object item);
    void describeMismatch(Object item, Description mismatchDescription);

    // Now check out this method here:

    /**
     * This method simply acts a friendly reminder not to implement 
     * Matcher directly and instead extend BaseMatcher. It's easy to 
     * ignore JavaDoc, but a bit harder to ignore compile errors .
     *
     * @see Matcher for reasons why.
     * @see BaseMatcher
     * @deprecated to make
     */
    @Deprecated
    void _dont_implement_Matcher___instead_extend_BaseMatcher_();
}

“Friendly reminder”, come on. ;-)

Other ways

I’m sure there are dozens of other ways to evolve an interface-based API. I’m curious to hear your thoughts!

JavaBeans™ should be extended to reduce bloat

JavaBeans™ has been around for a long time in the Java world. At some point of time, people realised that the concept of getters and setters was good to provide some abstraction over “object properties”, which should not be accessed directly. A typical “bean” would look like this:

public class MyBean {
    private int myProperty;

    public int getMyProperty() {
        return myProperty;
    }

    public void setMyProperty(int myProperty) {
        this.myProperty = myProperty;
    }
}

In various expression languages and other notations, you could then access “myProperty” using a simple property notation, which is good:

// The below would resolve to myBean.getMyProperty()
myBean.myProperty

// This could resolve to myBean.setMyProperty(5)
myBean.myProperty = 5

Critique on Java properties

Other languages, such as C# even allow to inline such property expressions in regular C# code, in order to call getters and setters. Why not Java?

Getter and Setter naming

Why do I have to use those bloated “get”/”is” and “set” prefixes every time I want to manipulate object properties? Besides, the case of the first letter of the property changes, too. If you want to perform a case-sensitive search on all usage of a property, you will have to write quite a regular expression to do so

Setter returning void

Returning void is one of the biggest reasons Java generates so much bloat at API call sites. Since the early days of Java, method chaining was a wide-spread practice. No one would like to miss StringBuilder’s (or StringBuffer’s) chainable append() methods. They’re very useful. Why doesn’t the Java compiler allow to re-access the property container after calling a setter?

A better Java

In other words, this API:

public interface API {
    void oneMethod();
    void anotherMethod();
    void setX(int x);
    int  getX();
}

Should be usable as such:

API api = ...
int x = api.oneMethod()     // Returning void should in fact "return" api
           .anotherMethod() // Returning void should in fact "return" api
           .x;              // Getter access, if x is not accessible

Let’s make this a JSR!

The depths of Java: API leak exposed through covariance

Java can be very tricky some times, especially in API design. Let’s have a look at a very interesting showcase. jOOQ strongly separates API from implementation. All API is in the org.jooq package, and public. Most implementation is in the org.jooq.impl package and package-private. Only factories and some dedicated base implementations are public. This allows for very powerful package-level encapsulation, exposing mostly only interfaces to jOOQ users.

A simplified example of package-level encapsulation

Here’s roughly how jOOQ models SQL tables. The (overly simplified) API:

package org.jooq;

/**
 * A table in a database
 */
public interface Table {

  /**
   * Join two tables
   */
  Table join(Table table);
}

And two (overly simplified) implementation classes:

package org.jooq.impl;

import org.jooq.Table;

/**
 * Base implementation
 */
abstract class AbstractTable implements Table {

  @Override
  public Table join(Table table) {
    return null;
  }
}

/**
 * Custom implementation, publicly exposed to client code
 */
public class CustomTable extends AbstractTable {
}

How the internal API is exposed

Let’s assume that the internal API does some tricks with covariance:

abstract class AbstractTable implements Table, InteralStuff {

  // Note, this method returns AbstractTable, as it might
  // prove to be convenient to expose some internal API
  // facts within the internal API itself
  @Override
  public AbstractTable join(Table table) {
    return null;
  }

  /**
   * Some internal API method, also package private
   */
  void doThings() {}
  void doMoreThings() {

    // Use the internal API
    join(this).doThings();
  }
}

This looks all safe at the first sight, but is it? AbstractTable is package-private, but CustomTable extends it and inherits all of its API, including the covariant method override of “AbstractTable join(Table)”. What does that result in? Check out the following piece of client code

package org.jooq.test;

import org.jooq.Table;
import org.jooq.impl.CustomTable;

public class Test {
  public static void main(String[] args) {
    Table joined = new CustomTable();

    // This works, no knowledge of AbstractTable exposed to the compiler
    Table table1 = new CustomTable();
    Table join1 = table1.join(joined);

    // This works, even if join exposes AbstractTable
    CustomTable table2 = new CustomTable();
    Table join2 = table2.join(joined);

    // This doesn't work. The type AbstractTable is not visible
    Table join3 = table2.join(joined).join(joined);
    //            ^^^^^^^^^^^^^^^^^^^ This cannot be dereferenced

    // ... so hide these implementation details again
    // The API flaw can be circumvented with casting
    Table join4 = ((Table) table2.join(joined)).join(joined);
  }
}

Conclusion

Tampering with visibilities in class hierarchies can be dangerous. Beware of the fact that API methods declared in interfaces are always public, regardless of any covariant implementations that involve non-public artefacts. This can be quite annoying for API users when not properly dealt with by API designers.

Fixed in the next version of jOOQ :-)

The good API design

I’ve stumbled upon a nice checklist wrapping up API design guidelines. An extract:

  1. Favor placing API and implementation into separate packages
  2. Favor placing APIs into high-level packages and implementation into lower-level packages
  3. Consider breaking up large APIs into several packages
  4. Consider putting API and implementation packages into separate Java archives
  5. Avoid (minimize) internal dependencies on implementation classes in APIs
  6. Avoid unnecessary API fragmentation
  7. Do not place public implementation classes in the API package
  8. Do not create dependencies between callers and implementation classes
  9. Do not place unrelated APIs into the same package
  10. Do not place API and SPI into the same package
  11. Do not move or rename the package of an already released public API

See the full checklist here:

http://theamiableapi.com/2012/01/16/java-api-design-checklist/

A neater way to use reflection in Java

Reflection in Java really feels awkward. The java.lang.reflect API is very powerful and complete, and in that sense also very verbose. Unlike in most scripting languages, there is no convenient way to access methods and fields dynamically using reflection. By convenient, I mean things like this

// PHP
$method = 'my_method';
$field  = 'my_field';

// Dynamically call a method
$object->$method();

// Dynamically access a field
$object->$field;

Or even better

// JavaScript
var method   = 'my_method';
var field    = 'my_field';

// Dynamically call a function
object[method]();

// Dynamically access a field
object[field];

For us Java guys, this is something we can only dream of. We would write this

String method = "my_method";
String field  = "my_field";

// Dynamically call a method
object.getClass().getMethod(method).invoke(object);

// Dynamically access a field
object.getClass().getField(field).get(object);

Obviously, this doesn’t take care of NullPointerExceptions, InvocationTargetExceptions, IllegalAccessExceptions, IllegalArgumentExceptions, SecurityExceptions, primitive types, etc. In an enterprise world, things need to be safe and secure, and the Java architects have thought of all possible problems that could arise when using reflection. But in many cases, we know what we’re doing and we don’t care about most of those features. We’d prefer the less verbose way.

That’s why I have created another sibling in the jOO* family: jOOR (Java Object Oriented Reflection). While this is not a killer libary, it might be useful for 1-2 developers out there looking for a simple and fluent solution. Here’s an example I have recently encountered on stack overflow, where jOOR might fit in just perfectly:

// Classic example of reflection usage
try {
  Method m1 = department.getClass().getMethod("getEmployees");
  Employee employees = (Employee[]) m1.invoke(department);

  for (Employee employee : employees) {
    Method m2 = employee.getClass().getMethod("getAddress");
    Address address = (Address) m2.invoke(employee);

    Method m3 = address.getClass().getMethod("getStreet");
    Street street = (Street) m3.invoke(address);

    System.out.println(street);
  }
}

// There are many checked exceptions that you are likely to ignore anyway 
catch (Exception ignore) {

  // ... or maybe just wrap in your preferred runtime exception:
  throw new RuntimeException(e);
}

And the same example using jOOR:

Employee[] employees = on(department).call("getEmployees").get();

for (Employee employee : employees) {
  Street street = on(employee).call("getAddress").call("getStreet").get();
  System.out.println(street);
}

See the full Stack Overflow question here:

http://stackoverflow.com/questions/4385003/java-reflection-open-source/8672186

Another example:

String world = 
on("java.lang.String")  // Like Class.forName()
 .create("Hello World") // Call the most specific matching constructor
 .call("substring", 6)  // Call the most specific matching method
 .call("toString")      // Call toString()
 .get()                 // Get the wrapped object, in this case a String

Get jOOR for free here:

http://code.google.com/p/joor/