How to Design a Good, Regular API

People have strong opinions on how to design a good API. Consequently, there are lots of pages and books in the web, explaining how to do it. This article will focus on a particular aspect of good APIs: Regularity. Regularity is what happens when you follow the “Principle of Least Astonishment“. This principle holds true no matter what kinds of personal taste and style you would like to put into your API, otherwise. It is thus one of the most important features of a good API. The following are a couple of things to keep in mind when designing a “regular” API:

Rule #1: Establish strong terms

If your API grows, there will be repetitive use of the same terms, over and over again. For instance, some actions will be come in several flavours resulting in various classes / types / methods, that differ only subtly in behaviour. The fact that they’re similar should be reflected by their names. Names should use strong terms. Take JDBC for instance. No matter how you execute a Statement, you will always use the term execute to do it. For instance, you will call any of these methods: In a similar fashion, you will always use the term close to release resources, no matter which resource you’re releasing. For instance, you will call: As a matter of fact, close is such a strong and established term in the JDK, that it has lead to the interfaces java.io.Closeable (since Java 1.5), and java.lang.AutoCloseable (since Java 1.7), which generally establish a contract of releasing resources. Rule violation: Observable This rule is violated a couple of times in the JDK. For instance, in the java.util.Observable class. While other “Collection-like” types established the terms
  • size()
  • remove()
  • removeAll()
… this class declares There is no good reason for using other terms in this context. The same applies to Observer.update(), which should really be called notify(), an otherwise established term in JDK APIs Rule violation: Spring. Most of it Spring has really gotten popular in the days when J2EE was weird, slow, and cumbersome. Think about EJB 2.0… There may be similar opinions on Spring out there, which are off-topic for this post. Here’s how Spring violates this concrete rule. A couple of random examples where Spring fails to establish strong terms, and uses long concatenations of meaningless, inconcise words instead: Apart from “feeling” like a horrible API (to me), here’s some more objective analysis:
  • What’s the difference between a Creator and a Factory
  • What’s the difference between a Source and a Provider?
  • What’s the non-subtle difference between an Advisor and a Provider?
  • What’s the non-subtle difference between a Discoverer and a Provider?
  • Is an Advisor related to an AspectJAdvice?
  • Is it a ScanningCandidate or a CandidateComponent?
  • What’s a TargetSource? And how would it be different from a SourceTarget if not a SourceSource or my favourite: A SourceSourceTargetProviderSource?
Gary Fleming commented on my previous blog post about Spring’s funny class names:
I’d be willing to bet that a Markov-chain generated class name (based on Spring Security) would be indistinguishable from the real thing.
Back to more seriousness…

Rule #2: Apply symmetry to term combinations

Once you’ve established strong terms, you will start combining them. When you look at the JDK’s Collection APIs, you will notice the fact that they are symmetric in a way that they’ve established the terms add(), remove(), contains(), and all, before combining them symmetrically: Now, the Collection type is a good example where an exception to this rule may be acceptable, when a method doesn’t “pull its own weight”. This is probably the case for retainAll(Collection<?>), which doesn’t have an equivalent retain(E) method. It might just as well be a regular violation of this rule, though. Rule violation: Map This rule is violated all the time, mostly because of some methods not pulling their own weight (which is ultimately a matter of taste). With Java 8’s defender methods, there will no longer be any excuse of not adding default implementations for useful utility methods that should’ve been on some types. For instance: Map. It violates this rule a couple of times: Observe also, that there is no point of using the term Set in the method names. The method signature already indicates that the result has a Set type. It would’ve been more consistent and symmetric if those methods would’ve been named keys(), values(), entries(). (On a side-note, Sets and Lists are another topic that I will soon blog about, as I think those types do not pull their own weight either) At the same time, the Map interface violates this rule by providing Besides, establishing the term clear() instead of reusing removeAll() with no arguments is unnecessary. This applies to all Collection API members. In fact, the clear() method also violates rule #1. It is not immediately obvious, if clear does anything subtly different from remove when removing collection elements.

Rule #3: Add convenience through overloading

There is mostly only one compelling reason, why you would want to overload a method: Convenience. Often you want to do precisely the same thing in different contexts, but constructing that very specific method argument type is cumbersome. So, for convenience, you offer your API users another variant of the same method, with a “friendlier” argument type set. This can be observed again in the Collection type. We have: Another example is the Arrays utility class. We have: Overloading is mostly used for two reasons:
  1. Providing “default” argument behaviour, as in Collection.toArray()
  2. Supporting several incompatible, yet “similar” argument sets, as in Arrays.copyOf()
Other languages have incorporated these concepts into their language syntax. Many languages (e.g. PL/SQL) formally support named default arguments. Some languages (e.g. JavaScript) don’t even care how many arguments there really are. And another, new JVM language called Ceylon got rid of overloading by combining the support for named, default arguments with union types. As Ceylon is a statically typed language, this is probable the most powerful approach of adding convenience to your API. Rule violation: TreeSet It is hard to find a good example of a case where this rule is violated in the JDK. But there is one: the TreeSet and TreeMap. Their constructors are overloaded several times. Let’s have a look at these two constructors: The latter “cleverly” adds some convenience to the first in that it extracts a well-known Comparator from the argument SortedSet to preserve ordering. This behaviour is quite different from the compatible (!) first constructor, which doesn’t do an instanceof check of the argument collection. I.e. these two constructor calls result in different behaviour:

SortedSet<Object> original = // [...]

// Preserves ordering:
new TreeSet<Object>(original);

// Resets ordering:
new TreeSet<Object>((Collection<Object>) original);

These constructors violate the rule in that they produce completely different behaviour. They’re not just mere convenience.

Rule #4: Consistent argument ordering

Be sure that you consistently order arguments of your methods. This is an obvious thing to do for overloaded methods, as you can immediately see how it is better to always put the array first and the int after in the previous example from the Arrays utility class: But you will quickly notice that all methods in that class will put the array being operated on first. Some examples: Rule violation: Arrays The same class also “subtly” violates this rule in that it puts optional arguments in between other arguments, when overloading methods. For instance, it declares When the latter should’ve been fill(Object[], Object, int, int). This is a “subtle” rule violation, as you may also argue that those methods in Arrays that restrict an argument array to a range will always put the array and the range argument together. In that way, the fill() method would again follow the rule as it provides the same argument order as copyOfRange(), for instance: You will never be able to escape this problem if you heavily overload your API. Unfortunately, Java doesn’t support named parameters, which helps formally distinguishing arguments in a large argument list, as sometimes, large argument lists cannot be avoided. Rule violation: String Another case of a rule violation is the String class: The problems here are:
  • It is hard to immediately understand the difference between the two methods, as the optional boolean argument is inserted at the beginning of the argument list
  • It is hard to immediately understand the purpose of every int argument, as there are many arguments in a single method

Rule #5: Establish return value types

This may be a bit controversial as people may have different views on this topic. No matter what your opinion is, however, you should create a consistent, regular API when it comes to defining return value types. An example rule set (on which you may disagree):
  • Methods returning a single object should return null when no object was found
  • Methods returning several objects should return an empty List, Set, Map, array, etc. when no object was found (never null)
  • Methods should only throw exceptions in case of an … well, an exception
With such a rule set, it is not a good practice to have 1-2 methods lying around, which:
  • … throw ObjectNotFoundExceptions when no object was found
  • … return null instead of empty Lists
Rule violation: File File is an example of a JDK class that violates many rules. Among them, the rule of regular return types. Its File.list() Javadoc reads:
An array of strings naming the files and directories in the directory denoted by this abstract pathname. The array will be empty if the directory is empty. Returns null if this abstract pathname does not denote a directory, or if an I/O error occurs.
So, the correct way to iterate over file names (if you’re doing defensive programming) is:

String[] files = file.list();

// You should never forget this null check!
if (files != null) {
    for (String file : files) {
        // Do things with your file
    }
}

Of course, we could argue that the Java 5 expert group could’ve been nice with us and worked that null check into their implementation of the foreach loop. Similar to the missing null check when switching over an enum (which should lead to the default: case). They’ve probably preferred the “fail early” approach in this case. The point here is that File already has sufficient means of checking if file is really a directory (File.isDirectory()). And it should throw an IOException if something went wrong, instead of returning null. This is a very strong violation of this rule, causing lots of pain at the call-site… Hence: NEVER return null when returning arrays or collections! Rule violation: JPA An example of how JPA violates this rule is the way how entities are retrieved from the EntityManager or from a Query: As NoResultException is a RuntimeException this flaw heavily violates the Principle of Least Astonishment, as you might stay unaware of this difference until runtime! IF you insist on throwing NoResultExceptions, make them checked exceptions as client code MUST handle them

Conclusion and further reading

… or rather, further watching. Have a look at Josh Bloch’s presentation on API design. He agrees with most of my claims, around 0:30:30
Another useful example of such a web page is the “Java API Design Checklist” by The Amiable API: Java API Design Checklist

11 thoughts on “How to Design a Good, Regular API

  1. Wow, this is a great article. One of those that you have to read several times and meditate to get the most out of them. I was pondering on the case of method overloading, and I think that besides all this convenience that you mention, this feature is sometimes used to overcome the problem of multiple dispatch. Since Java does not have mutimethods, the use of overloading with a visitor pattern helps to implement a similar idiom in terms of functionality. Do you think this could be classified as a convenience idiom as well?

    Now that mentioned it I am intrigued about whetther Ceylon has multimethods since they got rid of overloading. I’ll have to take a look, that’ll be interesting :-)

    Also, in the point #5, about return types, I started thinking about how useful the new Option class un Java 8 will be to overcome the problems inherent to returning null or something. Do you think as well that this is also a good way to improve API designs dealing with this kind of return values?

    1. Thanks for the nice words! I hope you didn’t meditate too long ;-)

      This article has been on my mind for a long time. Finally, when I had recently discovered Ceylon’s regularity in language syntax, I had to write it down. Stay tuned for an analysis / rant about language regularity and how Java fails miserably in this.

      About your feedback:

      Since Java does not have multimethods, the use of overloading with a visitor pattern helps to implement a similar idiom in terms of functionality. Do you think this could be classified as a convenience idiom as well?

      Yes, that is another use case for overloading. However, I personally think that the visitor pattern is one of the biggest anti-patterns to ever apply. If this topic interests you, I can suggest this article I had written recently. Apart from personal taste, that idiom doesn’t seem to match what I’ve written, i.e. multimethods are probably more than mere convenience, as each method is expected to behave very differently. Looking forward to a blog post of yours regarding multimethods and the Ceylon language, though!

      Do you think as well that this [Option types] is also a good way to improve API designs dealing with this kind of return values?

      I suspect you mean this new java.util.Optional type that Oracle has stolen from languages like Scala? I’m not quite sure yet, how this will be adopted. Many objects are really “optional” in a way that null references are perfectly meaningful. I think that the extra boilerplate code could prevent a broad adoption of this type. Ceylon, again, has an interesting solution for this, allowing to decorate a type with a question-mark to indicate that it is optional. You can then make null-safe method calls on such a type.

      But again, I’m not quite sure if these things pull their own weight… With a good API, you hardly ever run into a NullPointerException

      1. I will be looking forward to reading your article on Ceylon. I had been hearing about it, but haven’t had time to look into it in detail, so it’ll be great to get some perspective on the subject.

        Also it will definitely be interesting to write an article on multimethods. That’s a great suggestion and I will definitely look into that in the coming days.

        Regarding the use of Optional I think it even predates Scala (2003). In functional languages like ML (1973) there is no way to express the concept of null, instead, one must use a datatype constructor that represents the absence of a value. In SML this is called an Option, whose value could be SOME ‘a or NONE. I believe Haskell (1991) has something similar with a datatype called Maybe, which has two constructors Just ‘a or Nothing.

        I have also read of a similar idiom in a book named Refactoring to Patterns by Joshua Kerievsky where he suggests a pattern that he calls the Null Object pattern. No exactly the same as that from the functional world, but quite similar in its final purpose.

        But I think you’re right, even with the new Optional object, we still have to check whether it contains something or not, which does not prevent the writing of the check, all the contrary, it enforces it, and I think this is what allegedly should prevent the NPE, but at the cost of having to write this checks over and over again (I assumed this is what you meant when you mention the boilerplate code). Perhaps in the future we can get some syntactic sugar that allows us to write the check using a ? (question mark) and the compiler can expand it to check if the optional contains something or not. That’d be great.

        1. I see, you’ve done your research about “Optional” :-) It looks like this has been an eternal debate between language designers…

          I like SQL’s NULL which is an actual “unknown” instance of the target type. I.e. a CAST(NULL AS INTEGER) is really an INTEGER instance, not the absence of such an instance. This is similar to Java’s Double.NaN. On the other hand, having an “unknown” instance for every type makes things quite complicated, specifically for boolean algebra, as all truth tables now have to deal with TRUE, FALSE, and UNKNOWN.

          (I assumed this is what you meant when you mention the boilerplate code)

          Yes. We’ll be writing stuff like:

          // source() returns Optional<T>
          T value = source().get();
          T value = source().orElse(defaultT);
          

          It can be useful if the expert group adds enough convenience to the Optional type. Right now, it looks pretty scarce. But being able to do things like orElse(), orElseThrow() doesn’t look so wrong in some circumstances.

          Perhaps in the future we can get some syntactic sugar that allows us to write the check using a ? (question mark) and the compiler can expand it to check if the optional contains something or not. That’d be great.

          I’m sure that was one of the driving forces for actually adding that type. Another driving force was certainly the projected (but postponed) value type improvement, which should bring primitives and wrappers closer together. Note the existence of OptionalDouble, OptionalInt, OptionalLong. Having a “standard” API for reference and primitive types may certainly help in the future.

  2. Nice post! I would argue that every developer who is designing a Java library should have read “Effective Java” from Joshua Bloch before. It is a seminal book which covers these topics in detail. If all Sun/Oracle developers had read it, then the Java APIs would be a much nicer tool to use.

    1. Thanks! I agree that reading “Effective Java” or also Erich Gamma’s “Design Patterns” is a good thing. However, it takes time and experience to really appreciate their assembled knowledge and findings, and to truly understand the rationale behind it. It is always easier to criticise and recognise anti-patterns than to put knowledge into action. As far as I’m concerned, I feel like I’m still not quite there yet…

    2. … forgot to mention. Another thing that I’m really worried about time and again is the unjustified success of Spring whose popularity makes it really hard for Java newbies to assess what is really a good API.

      Spring had filled a gap, yes. But it filled it with mediocrity, in my opinion.

Leave a Reply