How to Design a Good, Regular API


People have strong opinions on how to design a good API. Consequently, there are lots of pages and books in the web, explaining how to do it. This article will focus on a particular aspect of good APIs: Regularity. Regularity is what happens when you follow the “Principle of Least Astonishment“. This principle holds true no matter what kinds of personal taste and style you would like to put into your API, otherwise. It is thus one of the most important features of a good API.

The following are a couple of things to keep in mind when designing a “regular” API:

Rule #1: Establish strong terms

If your API grows, there will be repetitive use of the same terms, over and over again. For instance, some actions will be come in several flavours resulting in various classes / types / methods, that differ only subtly in behaviour. The fact that they’re similar should be reflected by their names. Names should use strong terms. Take JDBC for instance. No matter how you execute a Statement, you will always use the term execute to do it. For instance, you will call any of these methods:

In a similar fashion, you will always use the term close to release resources, no matter which resource you’re releasing. For instance, you will call:

As a matter of fact, close is such a strong and established term in the JDK, that it has lead to the interfaces java.io.Closeable (since Java 1.5), and java.lang.AutoCloseable (since Java 1.7), which generally establish a contract of releasing resources.

Rule violation: Observable

This rule is violated a couple of times in the JDK. For instance, in the java.util.Observable class. While other “Collection-like” types established the terms

  • size()
  • remove()
  • removeAll()

… this class declares

There is no good reason for using other terms in this context. The same applies to Observer.update(), which should really be called notify(), an otherwise established term in JDK APIs

Rule violation: Spring. Most of it

Spring has really gotten popular in the days when J2EE was weird, slow, and cumbersome. Think about EJB 2.0… There may be similar opinions on Spring out there, which are off-topic for this post. Here’s how Spring violates this concrete rule. A couple of random examples where Spring fails to establish strong terms, and uses long concatenations of meaningless, inconcise words instead:

Apart from “feeling” like a horrible API (to me), here’s some more objective analysis:

  • What’s the difference between a Creator and a Factory
  • What’s the difference between a Source and a Provider?
  • What’s the non-subtle difference between an Advisor and a Provider?
  • What’s the non-subtle difference between a Discoverer and a Provider?
  • Is an Advisor related to an AspectJAdvice?
  • Is it a ScanningCandidate or a CandidateComponent?
  • What’s a TargetSource? And how would it be different from a SourceTarget if not a SourceSource or my favourite: A SourceSourceTargetProviderSource?

Gary Fleming commented on my previous blog post about Spring’s funny class names:

I’d be willing to bet that a Markov-chain generated class name (based on Spring Security) would be indistinguishable from the real thing.

Back to more seriousness…

Rule #2: Apply symmetry to term combinations

Once you’ve established strong terms, you will start combining them. When you look at the JDK’s Collection APIs, you will notice the fact that they are symmetric in a way that they’ve established the terms add(), remove(), contains(), and all, before combining them symmetrically:

Now, the Collection type is a good example where an exception to this rule may be acceptable, when a method doesn’t “pull its own weight”. This is probably the case for retainAll(Collection<?>), which doesn’t have an equivalent retain(E) method. It might just as well be a regular violation of this rule, though.

Rule violation: Map

This rule is violated all the time, mostly because of some methods not pulling their own weight (which is ultimately a matter of taste). With Java 8’s defender methods, there will no longer be any excuse of not adding default implementations for useful utility methods that should’ve been on some types. For instance: Map. It violates this rule a couple of times:

Observe also, that there is no point of using the term Set in the method names. The method signature already indicates that the result has a Set type. It would’ve been more consistent and symmetric if those methods would’ve been named keys(), values(), entries(). (On a side-note, Sets and Lists are another topic that I will soon blog about, as I think those types do not pull their own weight either)

At the same time, the Map interface violates this rule by providing

Besides, establishing the term clear() instead of reusing removeAll() with no arguments is unnecessary. This applies to all Collection API members. In fact, the clear() method also violates rule #1. It is not immediately obvious, if clear does anything subtly different from remove when removing collection elements.

Rule #3: Add convenience through overloading

There is mostly only one compelling reason, why you would want to overload a method: Convenience. Often you want to do precisely the same thing in different contexts, but constructing that very specific method argument type is cumbersome. So, for convenience, you offer your API users another variant of the same method, with a “friendlier” argument type set. This can be observed again in the Collection type. We have:

Another example is the Arrays utility class. We have:

Overloading is mostly used for two reasons:

  1. Providing “default” argument behaviour, as in Collection.toArray()
  2. Supporting several incompatible, yet “similar” argument sets, as in Arrays.copyOf()

Other languages have incorporated these concepts into their language syntax. Many languages (e.g. PL/SQL) formally support named default arguments. Some languages (e.g. JavaScript) don’t even care how many arguments there really are. And another, new JVM language called Ceylon got rid of overloading by combining the support for named, default arguments with union types. As Ceylon is a statically typed language, this is probable the most powerful approach of adding convenience to your API.

Rule violation: TreeSet

It is hard to find a good example of a case where this rule is violated in the JDK. But there is one: the TreeSet and TreeMap. Their constructors are overloaded several times. Let’s have a look at these two constructors:

The latter “cleverly” adds some convenience to the first in that it extracts a well-known Comparator from the argument SortedSet to preserve ordering. This behaviour is quite different from the compatible (!) first constructor, which doesn’t do an instanceof check of the argument collection. I.e. these two constructor calls result in different behaviour:

SortedSet<Object> original = // [...]

// Preserves ordering:
new TreeSet<Object>(original);

// Resets ordering:
new TreeSet<Object>((Collection<Object>) original);

These constructors violate the rule in that they produce completely different behaviour. They’re not just mere convenience.

Rule #4: Consistent argument ordering

Be sure that you consistently order arguments of your methods. This is an obvious thing to do for overloaded methods, as you can immediately see how it is better to always put the array first and the int after in the previous example from the Arrays utility class:

But you will quickly notice that all methods in that class will put the array being operated on first. Some examples:

Rule violation: Arrays

The same class also “subtly” violates this rule in that it puts optional arguments in between other arguments, when overloading methods. For instance, it declares

When the latter should’ve been fill(Object[], Object, int, int). This is a “subtle” rule violation, as you may also argue that those methods in Arrays that restrict an argument array to a range will always put the array and the range argument together. In that way, the fill() method would again follow the rule as it provides the same argument order as copyOfRange(), for instance:

You will never be able to escape this problem if you heavily overload your API. Unfortunately, Java doesn’t support named parameters, which helps formally distinguishing arguments in a large argument list, as sometimes, large argument lists cannot be avoided.

Rule violation: String

Another case of a rule violation is the String class:

The problems here are:

  • It is hard to immediately understand the difference between the two methods, as the optional boolean argument is inserted at the beginning of the argument list
  • It is hard to immediately understand the purpose of every int argument, as there are many arguments in a single method

Rule #5: Establish return value types

This may be a bit controversial as people may have different views on this topic. No matter what your opinion is, however, you should create a consistent, regular API when it comes to defining return value types. An example rule set (on which you may disagree):

  • Methods returning a single object should return null when no object was found
  • Methods returning several objects should return an empty List, Set, Map, array, etc. when no object was found (never null)
  • Methods should only throw exceptions in case of an … well, an exception

With such a rule set, it is not a good practice to have 1-2 methods lying around, which:

  • … throw ObjectNotFoundExceptions when no object was found
  • … return null instead of empty Lists

Rule violation: File

File is an example of a JDK class that violates many rules. Among them, the rule of regular return types. Its File.list() Javadoc reads:

An array of strings naming the files and directories in the directory denoted by this abstract pathname. The array will be empty if the directory is empty. Returns null if this abstract pathname does not denote a directory, or if an I/O error occurs.

So, the correct way to iterate over file names (if you’re doing defensive programming) is:

String[] files = file.list();

// You should never forget this null check!
if (files != null) {
    for (String file : files) {
        // Do things with your file
    }
}

Of course, we could argue that the Java 5 expert group could’ve been nice with us and worked that null check into their implementation of the foreach loop. Similar to the missing null check when switching over an enum (which should lead to the default: case). They’ve probably preferred the “fail early” approach in this case.

The point here is that File already has sufficient means of checking if file is really a directory (File.isDirectory()). And it should throw an IOException if something went wrong, instead of returning null. This is a very strong violation of this rule, causing lots of pain at the call-site… Hence:

NEVER return null when returning arrays or collections!

Rule violation: JPA

An example of how JPA violates this rule is the way how entities are retrieved from the EntityManager or from a Query:

As NoResultException is a RuntimeException this flaw heavily violates the Principle of Least Astonishment, as you might stay unaware of this difference until runtime!

IF you insist on throwing NoResultExceptions, make them checked exceptions as client code MUST handle them

Conclusion and further reading

… or rather, further watching. Have a look at Josh Bloch’s presentation on API design. He agrees with most of my claims, around 0:30:30

Another useful example of such a web page is the “Java API Design Checklist” by The Amiable API:
Java API Design Checklist

Inadvertent Recursion Protection with Java ThreadLocals


Now here’s a little trick for those of you hacking around with third-party tools, trying to extend them without fully understanding them (yet!). Assume the following situation:

  • You want to extend a library that exposes a hierarchical data model (let’s assume you want to extend Apache Jackrabbit)
  • That library internally checks access rights before accessing any nodes of the content repository
  • You want to implement your own access control algorithm
  • Your access control algorithm will access other nodes of the content repository
  • … which in turn will again trigger access control
  • … which in turn will again access other nodes of the content repository

… Infinite recursion, possibly resulting in a StackOverflowError, if you’re not recursing breadth-first.

Now, you have two options:

  1. Take the time, sit down, understand the internals, and do it right. You probably shouldn’t recurse into your access control once you’ve reached your own extension. In the case of extending Jackrabbit, this would be done by using a System Session to further access nodes within your access control algorithm. A System Session usually bypasses access control.
  2. Be impatient, wanting to get results quickly, and prevent recursion with a trick

Of course, you really should opt for option 1. But who has the time to understand everything? 😉

Here’s how to implement that trick.

/**
 * This thread local indicates whether you've
 * already started recursing with level 1
 */
static final ThreadLocal<Boolean> RECURSION_CONTROL 
    = new ThreadLocal<Boolean>();

/**
 * This method executes a delegate in a "protected"
 * mode, preventing recursion. If a inadvertent 
 * recursion occurred, return a default instead
 */
public static <T> T protect(
        T resultOnRecursion, 
        Protectable<T> delegate) 
throws Exception {

    // Not recursing yet, allow a single level of
    // recursion and execute the delegate once
    if (RECURSION_CONTROL.get() == null) {
        try {
            RECURSION_CONTROL.set(true);
            return delegate.call();
        }
        finally {
            RECURSION_CONTROL.remove();
        }
    }

    // Abort recursion and return early
    else {
        return resultOnRecursion;
    }
}

/**
 * An API to wrap your code with
 */
public interface Protectable<T> {
    T call() throws Exception;
}

This works easily as can be seen in this usage example:

public static void main(String[] args) 
throws Exception {
    protect(null, new Protectable<Void>() {
        @Override
        public Void call() throws Exception {

            // Recurse infinitely
            System.out.println("Recursing?");
            main(null);

            System.out.println("No!");
            return null;
        }
    });
}

The recursive call to the main() method will be aborted by the protect method, and return early, instead of executing call().

This idea can also be further elaborated by using a Map of ThreadLocals instead, allowing for specifying various keys or contexts for which to prevent recursion. Then, you could also put an Integer into the ThreadLocal, incrementing it on recursion, allowing for at most N levels of recursion.

static final ThreadLocal<Integer> RECURSION_CONTROL 
    = new ThreadLocal<Integer>();

public static <T> T protect(
        T resultOnRecursion, 
        Protectable<T> delegate)
throws Exception {
    Integer level = RECURSION_CONTROL.get();
    level = (level == null) ? 0 : level;

    if (level < 5) {
        try {
            RECURSION_CONTROL.set(level + 1);
            return delegate.call();
        }
        finally {
            if (level > 0)
                RECURSION_CONTROL.set(level - 1);
            else
                RECURSION_CONTROL.remove();
        }
    }
    else {
        return resultOnRecursion;
    }
}

But again. Maybe you should just take a couple of minutes more and learn about how the internals of your host library really work, and get things right from the beginning… As always, when applying tricks and hacks! 🙂

Java Streams Preview vs .Net LINQ


I’ve started following this very promising blog by the “Geeks From Paradise”. Apart from the fact that I’m a bit envious of geeks living in Costa Rica, this comparison of the upcoming Java 8 Streams API with various of .NET’s LINQ API capabilities is a very interesting read.

A preview of what you’ll find there (just one of 19 examples):

LINQ

List<string> nameList1 = new List(){ 
  "Anders", "David", "James",
  "Jeff", "Joe", "Erik" };
nameList1.Select(c => "Hello! " + c).ToList()
         .ForEach(c => Console.WriteLine(c));

Java Streams

List<String> nameList1 = asList(
  "Anders", "David", "James",
  "Jeff", "Joe", "Erik");
nameList1.stream()
     .map(c -> "Hello! " + c)
     .forEach(System.out::println);

Read the full blog post here:
Java Streams Preview vs .Net High-Order Programming with LINQ

Will Java add LINQ to EL 3.0 in JSR-341?


This fact has somehow slipped by me unnoticed so far: As the JSR-341 websites claim, Java is going to add full .NET-Style LINQ support to its expression language 3.0!

While the JSR-341 website doesn’t explicitly mention these feature additions to the expression language, a lot of details can be seen here:
http://java.net/projects/el-spec/pages/CollectionOperations

This is very interesting, and raises a lot of questions:

  • Is Microsoft on board – i.e. are they part of the expert group?
  • Will Microsoft surrender / sell their LINQ-related patents to Oracle?
  • … or will Oracle challenge those patents?
  • Why would LINQ be added to the EL before it is added to the platform?

Anyway, the mailing lists don’t seem very active and the draft specs is still in a rather early stage. We’ll see where this goes, but this is clearly an addition that I will follow closely in the future!

See also my previous blog post on the subject

Java Collections API Quirks


So we tend to think we’ve seen it all, when it comes to the Java Collections API. We know our ways around Lists, Sets, Maps, Iterables, Iterators. We’re ready for Java 8’s Collections API enhancements.

But then, every once in a while, we stumble upon one of these weird quirks that originate from the depths of the JDK and its long history of backwards-compatibility. Let’s have a look at unmodifiable collections

Unmodifiable Collections

Whether a collection is modifiable or not is not reflected by the Collections API. There is no immutable List, Set or Collection base type, which mutable subtypes could extend. So, the following API doesn’t exist in the JDK:

// Immutable part of the Collection API
public interface Collection {
  boolean  contains(Object o);
  boolean  containsAll(Collection<?> c);
  boolean  isEmpty();
  int      size();
  Object[] toArray();
  <T> T[]  toArray(T[] array);
}

// Mutable part of the Collection API
public interface MutableCollection 
extends Collection {
  boolean  add(E e);
  boolean  addAll(Collection<? extends E> c);
  void     clear();
  boolean  remove(Object o);
  boolean  removeAll(Collection<?> c);
  boolean  retainAll(Collection<?> c);
}

Now, there are probably reasons, why things hadn’t been implemented this way in the early days of Java. Most likely, mutability wasn’t seen as a feature worthy of occupying its own type in the type hierarchy. So, along came the Collections helper class, with useful methods such as unmodifiableList(), unmodifiableSet(), unmodifiableCollection(), and others. But beware when using unmodifiable collections! There is a very strange thing mentioned in the Javadoc:

The returned collection does not pass the hashCode and equals operations through to the backing collection, but relies on Object’s equals and hashCode methods. This is necessary to preserve the contracts of these operations in the case that the backing collection is a set or a list.

“To preserve the contracts of these operations”. That’s quite vague. What’s the reasoning behind it? A nice explanation is given in this Stack Overflow answer here:

An UnmodifiableList is an UnmodifiableCollection, but the same is not true in reverse — an UnmodifiableCollection that wraps a List is not an UnmodifiableList. So if you compare an UnmodifiableCollection that wraps a List a with an UnmodifiableList that wraps the same List a, the two wrappers should not be equal. If you just passed through to the wrapped list, they would be equal.

While this reasoning is correct, the implications may be rather unexpected.

The bottom line

The bottom line is that you cannot rely on Collection.equals(). While List.equals() and Set.equals() are well-defined, don’t trust Collection.equals(). It may not behave meaningfully. Keep this in mind, when accepting a Collection in a method signature:

public class MyClass {
  public void doStuff(Collection<?> collection) {
    // Don't rely on collection.equals() here!
  }
}

Hibernate, and Querying 50k Records. Too Much?


An interesting read, showing that Hibernate can quickly get to its limits when it comes to querying 50k records – a relatively small number of records for a sophisticated database:
http://koenserneels.blogspot.ch/2013/03/bulk-fetching-with-hibernate.html

Of course, Hibernate can generally deal with such situations, but you have to start tuning Hibernate, digging into its more advanced features. Makes one think whether second-level caching, flushing, evicting, and all that stuff should really be the default behaviour…?

JUnit’s Evolving Structure


This is an interesting read:
http://edmundkirwan.com/general/junit.html

It’s an analysis about how JUnit gradually and organically evolved into a heavily entangled set of inter-dependent modules