The Lame Side of Java’s Backwards-Compatibility

Java is a very backwards-compatible language. Very as in very very very. It is so backwards compatible, we still have tons of deprecated code that was deprecated in the JDK 1.1. For example, most of the java.util.Date and java.util.Calendar API. Some may argue that it would’ve been easier to deprecate the classes altogether…

But things don’t get better as we’re approaching Java 8. Please, observe with me with a mixture of intrigue and disgust what is going to be added to the JDBC 4.2 specs:

“large”. As in “We should’ve made that a long instead of an int from the very beginning”. Luckily, Java 8 also introduces defender methods, such that the additions were done backwards-compatibly.

I wonder how many other places in the JDK should now have duplicate methods using the “large” term, because in the beginning, people chose int over long, when most processors were still 32bit, and it really did make a difference.

Also, I wonder what’ll happen when we run out of 64bit space in the year 2139, as mankind will reach the outer skirts of milky way. In order to be able to write the occasional planet-migration script, we’ll have to add things like executeHugeUpdate() to the JDBC specs in Java 11 – if we’re optimistic that Java 11 will be shipped by then 😉

For more info, you can see the up-to-date OpenJDK source code here:
http://hg.openjdk.java.net/lambda/lambda/jdk/file/tip/src/share/classes/java/sql/Statement.java

How to Design a Good, Regular API

People have strong opinions on how to design a good API. Consequently, there are lots of pages and books in the web, explaining how to do it. This article will focus on a particular aspect of good APIs: Regularity. Regularity is what happens when you follow the “Principle of Least Astonishment“. This principle holds true no matter what kinds of personal taste and style you would like to put into your API, otherwise. It is thus one of the most important features of a good API.

The following are a couple of things to keep in mind when designing a “regular” API:

Rule #1: Establish strong terms

If your API grows, there will be repetitive use of the same terms, over and over again. For instance, some actions will be come in several flavours resulting in various classes / types / methods, that differ only subtly in behaviour. The fact that they’re similar should be reflected by their names. Names should use strong terms. Take JDBC for instance. No matter how you execute a Statement, you will always use the term execute to do it. For instance, you will call any of these methods:

In a similar fashion, you will always use the term close to release resources, no matter which resource you’re releasing. For instance, you will call:

As a matter of fact, close is such a strong and established term in the JDK, that it has lead to the interfaces java.io.Closeable (since Java 1.5), and java.lang.AutoCloseable (since Java 1.7), which generally establish a contract of releasing resources.

Rule violation: Observable

This rule is violated a couple of times in the JDK. For instance, in the java.util.Observable class. While other “Collection-like” types established the terms

  • size()
  • remove()
  • removeAll()

… this class declares

There is no good reason for using other terms in this context. The same applies to Observer.update(), which should really be called notify(), an otherwise established term in JDK APIs

Rule violation: Spring. Most of it

Spring has really gotten popular in the days when J2EE was weird, slow, and cumbersome. Think about EJB 2.0… There may be similar opinions on Spring out there, which are off-topic for this post. Here’s how Spring violates this concrete rule. A couple of random examples where Spring fails to establish strong terms, and uses long concatenations of meaningless, inconcise words instead:

Apart from “feeling” like a horrible API (to me), here’s some more objective analysis:

  • What’s the difference between a Creator and a Factory
  • What’s the difference between a Source and a Provider?
  • What’s the non-subtle difference between an Advisor and a Provider?
  • What’s the non-subtle difference between a Discoverer and a Provider?
  • Is an Advisor related to an AspectJAdvice?
  • Is it a ScanningCandidate or a CandidateComponent?
  • What’s a TargetSource? And how would it be different from a SourceTarget if not a SourceSource or my favourite: A SourceSourceTargetProviderSource?

Gary Fleming commented on my previous blog post about Spring’s funny class names:

I’d be willing to bet that a Markov-chain generated class name (based on Spring Security) would be indistinguishable from the real thing.

Back to more seriousness…

Rule #2: Apply symmetry to term combinations

Once you’ve established strong terms, you will start combining them. When you look at the JDK’s Collection APIs, you will notice the fact that they are symmetric in a way that they’ve established the terms add(), remove(), contains(), and all, before combining them symmetrically:

Now, the Collection type is a good example where an exception to this rule may be acceptable, when a method doesn’t “pull its own weight”. This is probably the case for retainAll(Collection<?>), which doesn’t have an equivalent retain(E) method. It might just as well be a regular violation of this rule, though.

Rule violation: Map

This rule is violated all the time, mostly because of some methods not pulling their own weight (which is ultimately a matter of taste). With Java 8’s defender methods, there will no longer be any excuse of not adding default implementations for useful utility methods that should’ve been on some types. For instance: Map. It violates this rule a couple of times:

Observe also, that there is no point of using the term Set in the method names. The method signature already indicates that the result has a Set type. It would’ve been more consistent and symmetric if those methods would’ve been named keys(), values(), entries(). (On a side-note, Sets and Lists are another topic that I will soon blog about, as I think those types do not pull their own weight either)

At the same time, the Map interface violates this rule by providing

Besides, establishing the term clear() instead of reusing removeAll() with no arguments is unnecessary. This applies to all Collection API members. In fact, the clear() method also violates rule #1. It is not immediately obvious, if clear does anything subtly different from remove when removing collection elements.

Rule #3: Add convenience through overloading

There is mostly only one compelling reason, why you would want to overload a method: Convenience. Often you want to do precisely the same thing in different contexts, but constructing that very specific method argument type is cumbersome. So, for convenience, you offer your API users another variant of the same method, with a “friendlier” argument type set. This can be observed again in the Collection type. We have:

Another example is the Arrays utility class. We have:

Overloading is mostly used for two reasons:

  1. Providing “default” argument behaviour, as in Collection.toArray()
  2. Supporting several incompatible, yet “similar” argument sets, as in Arrays.copyOf()

Other languages have incorporated these concepts into their language syntax. Many languages (e.g. PL/SQL) formally support named default arguments. Some languages (e.g. JavaScript) don’t even care how many arguments there really are. And another, new JVM language called Ceylon got rid of overloading by combining the support for named, default arguments with union types. As Ceylon is a statically typed language, this is probable the most powerful approach of adding convenience to your API.

Rule violation: TreeSet

It is hard to find a good example of a case where this rule is violated in the JDK. But there is one: the TreeSet and TreeMap. Their constructors are overloaded several times. Let’s have a look at these two constructors:

The latter “cleverly” adds some convenience to the first in that it extracts a well-known Comparator from the argument SortedSet to preserve ordering. This behaviour is quite different from the compatible (!) first constructor, which doesn’t do an instanceof check of the argument collection. I.e. these two constructor calls result in different behaviour:

SortedSet<Object> original = // [...]

// Preserves ordering:
new TreeSet<Object>(original);

// Resets ordering:
new TreeSet<Object>((Collection<Object>) original);

These constructors violate the rule in that they produce completely different behaviour. They’re not just mere convenience.

Rule #4: Consistent argument ordering

Be sure that you consistently order arguments of your methods. This is an obvious thing to do for overloaded methods, as you can immediately see how it is better to always put the array first and the int after in the previous example from the Arrays utility class:

But you will quickly notice that all methods in that class will put the array being operated on first. Some examples:

Rule violation: Arrays

The same class also “subtly” violates this rule in that it puts optional arguments in between other arguments, when overloading methods. For instance, it declares

When the latter should’ve been fill(Object[], Object, int, int). This is a “subtle” rule violation, as you may also argue that those methods in Arrays that restrict an argument array to a range will always put the array and the range argument together. In that way, the fill() method would again follow the rule as it provides the same argument order as copyOfRange(), for instance:

You will never be able to escape this problem if you heavily overload your API. Unfortunately, Java doesn’t support named parameters, which helps formally distinguishing arguments in a large argument list, as sometimes, large argument lists cannot be avoided.

Rule violation: String

Another case of a rule violation is the String class:

The problems here are:

  • It is hard to immediately understand the difference between the two methods, as the optional boolean argument is inserted at the beginning of the argument list
  • It is hard to immediately understand the purpose of every int argument, as there are many arguments in a single method

Rule #5: Establish return value types

This may be a bit controversial as people may have different views on this topic. No matter what your opinion is, however, you should create a consistent, regular API when it comes to defining return value types. An example rule set (on which you may disagree):

  • Methods returning a single object should return null when no object was found
  • Methods returning several objects should return an empty List, Set, Map, array, etc. when no object was found (never null)
  • Methods should only throw exceptions in case of an … well, an exception

With such a rule set, it is not a good practice to have 1-2 methods lying around, which:

  • … throw ObjectNotFoundExceptions when no object was found
  • … return null instead of empty Lists

Rule violation: File

File is an example of a JDK class that violates many rules. Among them, the rule of regular return types. Its File.list() Javadoc reads:

An array of strings naming the files and directories in the directory denoted by this abstract pathname. The array will be empty if the directory is empty. Returns null if this abstract pathname does not denote a directory, or if an I/O error occurs.

So, the correct way to iterate over file names (if you’re doing defensive programming) is:

String[] files = file.list();

// You should never forget this null check!
if (files != null) {
    for (String file : files) {
        // Do things with your file
    }
}

Of course, we could argue that the Java 5 expert group could’ve been nice with us and worked that null check into their implementation of the foreach loop. Similar to the missing null check when switching over an enum (which should lead to the default: case). They’ve probably preferred the “fail early” approach in this case.

The point here is that File already has sufficient means of checking if file is really a directory (File.isDirectory()). And it should throw an IOException if something went wrong, instead of returning null. This is a very strong violation of this rule, causing lots of pain at the call-site… Hence:

NEVER return null when returning arrays or collections!

Rule violation: JPA

An example of how JPA violates this rule is the way how entities are retrieved from the EntityManager or from a Query:

As NoResultException is a RuntimeException this flaw heavily violates the Principle of Least Astonishment, as you might stay unaware of this difference until runtime!

IF you insist on throwing NoResultExceptions, make them checked exceptions as client code MUST handle them

Conclusion and further reading

… or rather, further watching. Have a look at Josh Bloch’s presentation on API design. He agrees with most of my claims, around 0:30:30

Another useful example of such a web page is the “Java API Design Checklist” by The Amiable API:
Java API Design Checklist

A Typesafety Comparison of SQL Access APIs

SQL is a very expressive and distinct language. It is one of the few declarative languages which are used by a broad audience in everyday work. As a declarative language, SQL allows to specify what we’re expecting as output, not how this output should be produced. As a side-effect of this, ad-hoc record data types are created by every statement. An example:

-- A (id: integer, title: varchar) type is created
SELECT id, title
FROM book;

The above statement generates a cursor whose records have a well-defined record type with these properties:

  • The degree of the record is 2
  • The column names are id and title
  • The column types are integer and varchar
  • The column id can be accessed at index 1. The column title can be accessed at index 2

In other words, SQL records combine features from records (access by name) and tuples (access by index). They can be seen like typesafe associative “map-arrays”, where map keys are formally bound to array indexes and their associated key/index type.

Another, more complex example shows how these ad-hoc record types can be reused within a SQL statement!

-- A (c: bigint) type is created
SELECT count(*) c
FROM book

-- A (..: integer, ..: integer) type is created and compared with...
WHERE (author_id, language_id) IN (

  -- ... another, compatible (..: integer, ..: integer) type
  SELECT a.id, a.language_id
  FROM author a
)

This query counts books written by authors in their native language.

In the above example, the projected record type is a bit simpler. It contains only one column. The interesting part is the row value expression IN comparison predicate, which compares two compatible (integer, integer) types. In SQL, you can typesafely create ad-hoc record types and immediately compare them with other ad-hoc record types. In these comparisons, column names are not important, but column indexes (and associated column types) are.

Comparing various SQL access APIs

The previous examples show how SQL allows for the formal declaration of record types including record degree, column names, column indexes, column types. While SQL is very expressive in that matter, many client languages accessing SQL are less expressive. When comparing expressiveness and typesafety, two features should be taken into consideration:

  1. Are the records produced into the client language typesafe?
  2. Are the SQL statements produced from the client language typesafe and syntax-safe?

Let’s have a look at various accessing techniques, and how expressive they are in terms of the above typesafety requirements:

JDBC: Least typesafety

JDBC offers the least expressiveness and typesafety. This isn’t surprising, as JDBC is a very low-level API. It offers:

  1. No typesafety whatsoever when accessing result records.
  2. No typesafety or syntax-safety whatsoever when producing SQL statements.

Here is an example:

PreparedStatement stmt = null;
ResultSet rs = null;

try {

  // SQL statements are just strings. Constructing them is not
  // typesafe or syntax-safe
  stmt = connection.prepareStatement(
    "SELECT id, title FROM book WHERE id = ?");

  // Bind values are set by index. There is no typesafety or
  // "index safety"
  stmt.setInt(1, 15);

  rs = stmt.executeQuery();
  while (rs.next()) {

    // There is no typesafety or "index safety" when accessing
    // result record values
    System.out.println(
      "ID: " + rs.getInt(1) + ", TITLE: " + rs.getString(2));
  }
}
finally {
  closeSafely(stmt, rs);
}

Now, this wasn’t surprising. JDBC makes up for the lack of typesafety by being absolutely general. It is possible to implement a JDBC driver for any type of relational database, no matter what kinds of SQL and JDBC features they really support.

JPA: Some typesafety

JPA has implemented quite a bit of typesafety mostly on top of JPQL, but also slightly on top of SQL. With JPA, you can have:

  1. Some typesafety when accessing records.
  2. Some typesafety and syntax-safety when producing JPQL statements through the CriteriaQuery API (not SQL statements).

Record access typesafety can be guaranteed when you project the outcome of your statements onto your JPA-annotated entities. While the mapping itself isn’t really typesafe, the outcome is, as a Java class is the closest match to a SQL record. A Java class, much like a SQL record, has:

  • A degree, expressed in the number of properties
  • Column names, expressed as property names
  • Column types, expressed as property types
  • But: No column indexes. Properties have no explicit order

JPA record mapping has additional features that exceed the expressiveness of SQL, as “flat”, tabular result sets can be mapped onto object hierarchies. In any case, you will have to create one record / entity type per query to profit from this typesafety. If you’re not projecting all columns from every table, but ad-hoc records (including values derived from functions), you will lose this typesafety again.

When it comes to statement typesafety, JPA offers the CriteriaQuery API to produce typesafe JPQL statements. The CriteriaQuery API is often criticised for its verboseness and for the fact that resulting client code is hard to read. Here is an example taken from the CriteriaQuery API docs:

CriteriaQuery<String> q = cb.createQuery(String.class);
Root<Order> order = q.from(Order.class);
q.select(order.get("shippingAddress").<String>get("state"));
 
CriteriaQuery<Product> q2 = cb.createQuery(Product.class);
q2.select(q2.from(Order.class)
            .join("items")
            .<Item,Product>join("product"));

It can be seen that there is only a limited amount of typesafety in the above query construction:

  • Columns are accessed by string literals, such as "shippingAddress".
  • Generic entity types are not really checked. The <Item,Product> generic parameters might as well be wrong.

Of course, there are more typesafe API parts in JPA’s CriteriaQuery API. Using those API parts quickly lead to the aforementioned verbosity, though, as can be seen in this Stack Overflow question, or in the Java EE 6 Tutorials.

LINQ: Much typesafety (in .NET)

LINQ goes very far in offering typesafety in both dimensions:

  1. Much typesafety when accessing records or tuples.
  2. Much typesafety when producing LINQ-to-SQL statements (not SQL statements).

As LINQ is formally integrated into various .NET languages, it has the advantage of being able to produce formally defined record types, directly into the target language (e.g. C#). Not only can typesafe records be produced, the LINQ-to-SQL statement is formally verified by the compiler as well. An example

// Typesafe renaming (aliasing with "AS" in SQL)
From p In db.Products
// Typesafe (named!) variable binding
Where p.UnitsInStock <= ReorderLevel AndAlso Not p.Discontinued
// The typesafe projection will produce a Products record
Select p

Another example from Stack Overflow can be seen here:

// Producing a C# tuple
var r = from u in db.Users
        join s in db.Staffs on u.Id equals s.UserId
        select new Tuple<User, Staff>(u, s);

// Producing an anonymous record type
var r = from u in db.Users
    select new { u.Name, 
                 u.Address,
                 ...,
                 (from s in db.Staffs 
                  select s.Password where u.Id == s.UserId) 
               };

LINQ has many obvious advantages when it comes to typesafety. In the case of LINQ, this comes at the price of losing actual SQL expressivity and syntax, as LINQ-to-SQL is not really SQL (just as JPQL is not really SQL either). The SQL querying API is partially shared with other, heterogeneous querying targets, such as LINQ-to-Entities, LINQ-to-Collections, LINQ-to-XML. This will reduce LINQ’s feature scope (see also a previous blog post, and I will soon blog about this again).

But C# offers all typesafety aspects that a SQL record offers as well: degree, column name (anonymous types), column index (tuples), column types (both types and tuples).

SLICK: Much typesafety (in Scala)

SLICK has been inspired by LINQ, and can thus offer a lot of typesafety as well. It offers:

  1. Much typesafety when accessing tuples (not records).
  2. Much typesafety when producing SLICK statements (not SQL statements).

SLICK takes advantage of Scala’s integrated tuple expressions. This is best shown by example:

// "for" is the "entry-point" to the DSL
val q = for {

    // FROM clause   WHERE clause
    c <- Coffees     if c.supID === 101

// SELECT clause and projection to a tuple
} yield (c.name, c.price)

The above example shows that the projection onto a (String, Int) tuple is done typesafely by the yield method. At the same time, the whole query expression is formally validated by the compiler, as SLICK makes heavy use of Scala’s language features in order to introduce an internal DSL for querying. Much more than LINQ, SLICK has a unique syntax that doesn’t remind of SQL any more. It is not obvious how subqueries, complex joins, grouping and aggregation can be expressed.

jOOQ: Much typesafety

jOOQ is mainly inspired by SQL itself and embraces all the features that SQL offers. It has thus:

  1. Much typesafety when accessing records or tuples.
  2. Much typesafety when producing SQL statements.

jOOQ offers similar capabilities as JPA when it comes to mapping SQL result sets onto records, although JPA’s mapping type hierarchies are not supported by jOOQ. But jOOQ also allows for typesafe tuple access, the way SLICK has implemented it. Ad-hoc records produced by arbitrary query projections will maintain their various column types through generic Record1<T1>, Record2<T1, T2>, Record3<T1, T2, T3>, … record types. Unlike in Java, this can be leveraged extensively in Scala, where these typesafe Record[N] types can be used just like Scala’s tuples.

On the other hand, just like LINQ-to-SQL, which has formally integrated querying as a first-class citizen into .NET languages, jOOQ allows for heavy type-checking and syntax-checking, when writing SQL statements in Java.

In SQL, you can typesafely write things like:

SELECT * FROM t WHERE (t.a, t.b) = (1, 2)
SELECT * FROM t WHERE (t.a, t.b) OVERLAPS (date1, date2)
SELECT * FROM t WHERE (t.a, t.b) IN (SELECT x, y FROM t2)
UPDATE t SET (a, b) = (SELECT x, y FROM t2 WHERE ...)
INSERT INTO t (a, b) VALUES (1, 2)

In jOOQ 3.0, you can (also typesafely!) write

select().from(t).where(row(t.a, t.b).eq(1, 2));
// Type-check here: ----------------->  ^^^^
 
select().from(t).where(row(t.a, t.b).overlaps(date1, date2));
// Type-check here: ------------------------> ^^^^^^^^^^^^
 
select().from(t).where(row(t.a, t.b).in(select(t2.x, t2.y).from(t2)));
// Type-check here: -------------------------> ^^^^^^^^^^
 
update(t).set(row(t.a, t.b), select(t2.x, t2.y).where(...));
// Type-check here: --------------> ^^^^^^^^^^

insertInto(t, t.a, t.b).values(1, 2);
// Type-check here: ---------> ^^^^

This also applies for existing API, which doesn’t involve row value expressions:

select().from(t).where(t.a.eq(select(t2.x).from(t2));
// Type-check here: ---------------> ^^^^
 
select().from(t).where(t.a.eq(any(select(t2.x).from(t2)));
// Type-check here: -------------------> ^^^^
 
select().from(t).where(t.a.in(select(t2.x).from(t2));
// Type-check here: ---------------> ^^^^

select(t1.a, t1.b).from(t1).union(select(t2.a, t2.b).from(t2));
// Type-check here: -------------------> ^^^^^^^^^^

jOOQ is not SQL, but unlike other attempts of introducing SQL as an internal domain-specific language into host languages like Java, Scala, C#, jOOQ looks very much like SQL thanks to its unique fluent API technique, which informally follows an underlying BNF notation.

Even if Java offers less expressiveness than other languages like C# or Scala, jOOQ probably comes closest to both result record typesafety and SQL syntax safety in the Java world.

Defensive API evolution with Java interfaces

API evolution is something absolutely non-trivial. Something that only few have to deal with. Most of us work on internal, proprietary APIs every day. Modern IDEs ship with awesome tooling to factor out, rename, pull up, push down, indirect, delegate, infer, generalise our code artefacts. These tools make refactoring our internal APIs a piece of cake.

But some of us work on public APIs, where the rules change drastically. Public APIs, if done properly, are versioned. Every change – compatible or incompatible – should be published in a new API version. Most people will agree that API evolution should be done in major and minor releases, similar to what is specified in semantic versioning. In short: Incompatible API changes are published in major releases (1.0, 2.0, 3.0), whereas compatible API changes / enhancements are published in minor releases (1.0, 1.1, 1.2).

If you’re planning ahead, you’re going to foresee most of your incompatible changes a long time before actually publishing the next major release. A good tool in Java to announce such a change early is deprecation.

Interface API evolution

Now, deprecation is a good tool to indicate that you’re about to remove a type or member from your API. What if you’re going to add a method, or a type to an interface’s type hierarchy? This means that all client code implementing your interface will break – at least as long as Java 8’s defender methods aren’t introduced yet. There are several techniques to circumvent / work around this problem:

1. Don’t care about it

Yes, that’s an option too. Your API is public, but maybe not so much used. Let’s face it: Not all of us work on the JDK / Eclipse / Apache / etc codebases.

If you’re friendly, you’re at least going to wait for a major release to introduce new methods. But you can break the rules of semantic versioning if you really have to – if you can deal with the consequences of getting a mob of angry users.

Note, though, that other platforms aren’t as backwards-compatible as the Java universe (often by language design, or by language complexity). E.g. with Scala’s various ways of declaring things as implicit, your API can’t always be perfect.

2. Do it the Java way

The “Java” way is not to evolve interfaces at all. Most API types in the JDK have been the way they are today forever. Of course, this makes APIs feel quite “dinosaury” and adds a lot of redundancy between various similar types, such as StringBuffer and StringBuilder, or Hashtable and HashMap.

Note that some parts of Java don’t adhere to the “Java” way. Most specifically, this is the case for the JDBC API, which evolves according to the rules of section #1: “Don’t care about it”.

3. Do it the Eclipse way

Eclipse’s internals contain huge APIs. There are a lot of guidelines how to evolve your own APIs (i.e. public parts of your plugin), when developing for / within Eclipse. One example about how the Eclipse guys extend interfaces is the IAnnotationHover type. By Javadoc contract, it allows implementations to also implement IAnnotationHoverExtension and IAnnotationHoverExtension2. Obviously, in the long run, such an evolved API is quite hard to maintain, test, and document, and ultimately, hard to use! (consider ICompletionProposal and its 6 (!) extension types)

4. Wait for Java 8

In Java 8, you will be able to make use of defender methods. This means that you can provide a sensible default implementation for your new interface methods as can be seen in Java 1.8’s java.util.Iterator (an extract):

public interface Iterator<E> {

    // These methods are kept the same:
    boolean hasNext();
    E next();

    // This method is now made "optional" (finally!)
    public default void remove() {
        throw new UnsupportedOperationException("remove");
    }

    // This method has been added compatibly in Java 1.8
    default void forEach(Consumer<? super E> consumer) {
        Objects.requireNonNull(consumer);
        while (hasNext())
            consumer.accept(next());
    }
}

Of course, you don’t always want to provide a default implementation. Often, your interface is a contract that has to be implemented entirely by client code.

5. Provide public default implementations

In many cases, it is wise to tell the client code that they may implement an interface at their own risk (due to API evolution), and they should better extend a supplied abstract or default implementation, instead. A good example for this is java.util.List, which can be a pain to implement correctly. For simple, not performance-critical custom lists, most users probably choose to extend java.util.AbstractList instead. The only methods left to implement are then get(int) and size(), The behaviour of all other methods can be derived from these two:

class EmptyList<E> extends AbstractList<E> {
    @Override
    public E get(int index) {
        throw new IndexOutOfBoundsException("No elements here");
    }

    @Override
    public int size() {
        return 0;
    }
}

A good convention to follow is to name your default implementation AbstractXXX if it is abstract, or DefaultXXX if it is concrete

6. Make your API very hard to implement

Now, this isn’t really a good technique, but just a probable fact. If your API is very hard to implement (you have 100s of methods in an interface), then users are probably not going to do it. Note: probably. Never underestimate the crazy user. An example of this is jOOQ’s org.jooq.Field type, which represents a database field / column. In fact, this type is part of jOOQ’s internal domain specific language, offering all sorts of operations and functions that can be performed upon a database column.

Of course, having so many methods is an exception and – if you’re not designing a DSL – is probably a sign of a bad overall design.

7. Add compiler and IDE tricks

Last but not least, there are some nifty tricks that you can apply to your API, to help people understand what they ought to do in order to correctly implement your interface-based API. Here’s a tough example, that slaps the API designer’s intention straight into your face. Consider this extract of the org.hamcrest.Matcher API:

public interface Matcher<T> extends SelfDescribing {

    // This is what a Matcher really does.
    boolean matches(Object item);
    void describeMismatch(Object item, Description mismatchDescription);

    // Now check out this method here:

    /**
     * This method simply acts a friendly reminder not to implement 
     * Matcher directly and instead extend BaseMatcher. It's easy to 
     * ignore JavaDoc, but a bit harder to ignore compile errors .
     *
     * @see Matcher for reasons why.
     * @see BaseMatcher
     * @deprecated to make
     */
    @Deprecated
    void _dont_implement_Matcher___instead_extend_BaseMatcher_();
}

“Friendly reminder”, come on. 😉

Other ways

I’m sure there are dozens of other ways to evolve an interface-based API. I’m curious to hear your thoughts!

Bloated JavaBeans™, Part II – or Don’t Add “Getters” to Your API

I have recently blogged about an idea how JavaBeans™ could be extended to reduce the bloat created by this widely-accepted convention in the Java world.

That article was reblogged on DZone and got quite controversial feedback here (like most ideas that try to get some fresh ideas into the Java world):
http://java.dzone.com/articles/javabeans™-should-be-extended.

I want to revisit one of the thoughts I had in that article, which was given a bit less attention, namely:

Getter and Setter naming

Why do I have to use those bloated “get”/”is” and “set” prefixes every time I want to manipulate object properties? Besides, the case of the first letter of the property changes, too. If you want to perform a case-sensitive search on all usage of a property, you will have to write quite a regular expression to do so

I’m specifically having trouble understanding why we should use getters all over the place. Getters / setters are a convention to provide abstraction over property access. I.e., you’d typically write silly things like this, all the time:

public class MyBean {
    private int myProperty;

    public int getMyProperty() {
        return myProperty;
    }

    public void setMyProperty(int myProperty) {
        this.myProperty = myProperty;
    }
}

OK. Let’s accept that this appears to be our everyday life as a Java developer, writing all this bloat, instead of using standard keywords or annotations. I’m talking about standards, not proprietary things like Project Lombok. With facts of life accepted, let’s have a look at java.io.File for more details. To me, this is a good example where JavaBean-o-mania™ went quite wrong. Why? Check out this source code extract:

public class File {

    // This is the only relevant internal property. It would be "final"
    // if it wasn't set by serialisation magic in readObject()
    private String path;

    // Here are some arbitrary actions that you can perform on this file.
    // Usually, verbs are used as method names for actions. Good:
    public boolean delete();
    public void deleteOnExit();
    public boolean mkdir();
    public boolean renameTo(File dest);

    // Now the fun starts!
    // Here is the obvious "getter" as understood by JavaBeans™
    public String getPath();

    // Here are some additional "getters" that perform some transformation
    // on the underlying property, before returning it
    public String getName();
    public String getParent();
    public File getParentFile();
    public String getPath();

    // But some of these "transformation-getters" use "to", rather than
    // "get". Why "toPath()" but not "toParentFile()"? How to distinguish
    // "toPath()" and "getPath()"?
    public Path toPath();
    public URI toURI();

    // Here are some "getters" that aren't really getters, but retrieve
    // their information from the underlying file
    public long getFreeSpace();
    public long getTotalSpace();
    public long getUsableSpace();

    // But some of the methods qualifying as "not-really-getters" do not
    // feature the "get" action keyword, duh...
    public long lastModified();
    public long length();

    // Now, here's something. "Setters" that don't set properties, but
    // modify the underlying file. A.k.a. "not-really-setters"
    public boolean setLastModified(long time);
    public boolean setReadable(boolean readable);
    public boolean setWritable(boolean writable);

    // Note, of course, that it gets more confusing when you look at what
    // seem to be the "not-really-getters" for the above
    public long lastModified();
    public boolean canRead();
    public boolean canWrite();
}

Confused? Yes. But we have all ended up doing things this way, one time or another. jOOQ is no different, although, future versions will fix this.

How to improve things

Not all libraries and APIs are flawed in this way. Java has come a long way and has been written by many people with different views on the subject. Besides, Java is extremely backwards-compatible, so I don’t think that the JDK, if written from scratch, would still suffer from “JavaBean-o-mania™” as much. So here are a couple of rules that could be followed in new APIs, to get things a bit cleaned up:

  1. First off, decide whether your API will mainly be used in a Spring-heavy or JSP/JSF-heavy environment or any other environment that uses JavaBeans™-based expression languages, where you actually WANT to follow the standard convention. In that case, however, STRICTLY follow the convention and don’t name any information-retrieval method like this: “File.length()”. If you’re follwoing this paradigm, ALL of your methods should start with a verb, never with a noun / adjective
  2. The above applies to few libraries, hence you should probably NEVER use “get” if you want to access an object which is not a property. Just use the property name (noun, adjective). This will look much leaner at the call-site, specifically if your library is used in languages like Scala. In that way, “File.length()” was a good choice, just as “Enum.values()”, rather than “File.getLength()” or “Enum.getValues()”.
  3. You should probably ALSO NOT use “get” / “set” if you want to access properties. Java can easily separate the namespace for property / method names. Just use the property name itself in the getter / setter, like this:
    public class MyBean {
        private int myProperty;
    
        public int myProperty() {
            return myProperty;
        }
    
        public void myProperty(int myProperty) {
            this.myProperty = myProperty;
        }
    }
    

    Think about the first rule again, though. If you want to configure your bean with Spring, you may have no choice. But if you don’t need Spring, the above will have these advantages:

    • Your getters, setters, and properties have exactly the same name (and case of the initial letter). Text-searching across the codebase is much easier
    • The getter looks just like the property itself in languages like Scala, where these are equivalent expressions thanks to language syntax sugar: “myBean.myProperty()” and “myBean.myProperty”
    • Getter and setter are right next to each other in lexicographic ordering (e.g. in your IDE’s Outline view). This makes sense as the property itself is more interesting than the non-action of “getting” and “setting”
    • You never have to worry about whether to choose “get” or “is”. Besides, there are a couple of properties, where “get” / “is” are inappropriate anyway, e.g. whenever “has” is involved -> “getHasChildren()” or “isHasChildren()”? Meh, name it “hasChildren()” !! “setHasChildren(true)” ? No, “hasChildren(true)” !!
    • You can follow simple naming rules: Use verbs in imperative form to perform actions. Use nouns, adjectives or verbs in third person form to access objects / properties. This rule already proves that the standard convention is flawed. “get” is an imperative form, whereas “is” is a third person form.
  4. Consider returning “this” in the setter. Some people just like method-chaining:
        public MyBean myProperty(int myProperty) {
            this.myProperty = myProperty;
            return this;
        }
    
        // The above allows for things like
        myBean.myProperty(1).myOtherProperty(2).andThen(3);
    

    Alternatively, return the previous value, e.g:

        public int myProperty(int myProperty) {
            try {
                return this.myProperty;
            }
            finally {
                this.myProperty = myProperty;
            }
        }
    

    Make up your mind and choose one of the above, keeping things consistent across your API. In most cases, method chaining is less useful than an actual result value.

    Anyway, having “void” as return type is a waste of API scope. Specifically, consider Java 8’s lambda syntax for methods with / without return value (taken from Brian Goetz’s state of the lambda presentations):

    // Aaaah, Callables without curly braces nor semi-colons
    blocks.filter(b -> b.getColor() == BLUE);
    
    // Yuck! Blocks with curly braces and an extra semi-colon!
    blocks.forEach(b -> { b.setColor(RED); });
    
    // In other words, following the above rules, you probably
    // prefer to write:
    blocks.filter(b -> b.color() == BLUE)
          .forEach(b -> b.color(RED));
    

    Thinking about this now might be a decisive advantage of your API over your competition once Java 8 goes live (for those of us who maintain a public API).

  5. Finally, DO use “get” and “set” where you really want to emphasise the semantic of the ACTIONS called “getting” and “setting”. This includes getting and setting objects on types like:
    • Lists
    • Maps
    • References
    • ThreadLocals
    • Futures
    • etc…

    In all of these cases, “getting” and “setting” are actions, not property access. This is why you should use a verb like “get”, “set”, “put” and many others.

Summary

Be creative when designing an API. Don’t strictly follow the boring rules imposed by JavaBeans™ and Spring upon an entire industry. More recent JDK APIs and also well-known APIs by Google / Apache make little use of “get” and “set” when accessing objects / properties. Java is a static, type-safe language. Expression languages and configuration by injection are an exception in our every day work. Hence we should optimise our API for those use cases that we deal with the most. Better, if Spring adapt their ways of thinking to nice, lean, beautiful and fun APIs rather than forcing the Java world to bloat their APIs with boring stuff like getters and setters!

Row value expressions and the BETWEEN predicate

Now this is a simple example of how SQL clause simulation can get nasty if you want to make use of some more advanced SQL clauses that aren’t supported in all databases. Consider the following predicate and equivalent transformations thereof:

The BETWEEN predicate

The BETWEEN predicate is a convenient form of expressing the fact that one expression A should be in BETWEEN two other expressions B and C. This predicate was defined already in §8.4 of SQL-1992, and then refined in SQL-1999 (adding ASYMMETRIC/SYMMETRIC):

8.3 <between predicate>

Function
    Specify a range comparison.

Format
    <between predicate> ::=
        <row value expression> [ NOT ] BETWEEN 
          [ ASYMMETRIC | SYMMETRIC ]
          <row value expression> AND <row value expression>

While ASYMMETRIC is just a verbose way of expressing the default behaviour of the BETWEEN predicate, SYMMETRIC has the useful property of indicating that the order of B and C is irrelevant. Knowing this, the following transformations can be established:

BETWEEN predicate transformations

The following statements are all equivalent:

A BETWEEN SYMMETRIC B AND C
(A BETWEEN B AND C) OR (A BETWEEN C AND B)
(A >= B AND A <= C) OR (A >= C AND A <= B)

While this is still somewhat readable, try adding row value expressions:

-- The original statement
(A1, A2) BETWEEN SYMMETRIC (B1, B2) AND (C1, C2)

-- Transforming away BETWEEN SYMMETRIC
   (     (A1, A2) >= (B1, B2) 
     AND (A1, A2) <= (C1, C2) )
OR (     (A1, A2) >= (C1, C2) 
     AND (A1, A2) <= (B1, B2) )

-- Transforming away the row value expressions
   (     ((A1 > B1) OR (A1 = B1 AND A2 > B2) OR (A1 = B1 AND A2 = B2))
     AND ((A1 < C1) OR (A1 = C1 AND A2 < C2) OR (A1 = C1 AND A2 = C2)) )
OR (     ((A1 > C1) OR (A1 = C1 AND A2 > C2) OR (A1 = C1 AND A2 = C2))
     AND ((A1 < B1) OR (A1 = B1 AND A2 < B2) OR (A1 = B1 AND A2 = B2)) )

In the lowest expression, some parts could’ve been factored out for “simplicity”. The example is just to give you a picture of what the BETWEEN [SYMMETRIC] predicate really does to row value expressions.

Native SQL support for row value expressions and BETWEEN SYMMETRIC

Here’s a comprehensive list of the 14 SQL dialects supported by jOOQ, and what is natively supported by them:

Database BETWEEN SYMMETRIC RVE = RVE RVE < RVE RVE BETWEEN
CUBRID [1] no yes no no
DB2 no yes yes yes
Derby no no no no
Firebird no no no no
H2 [2] no yes yes yes
HSQLDB yes yes yes yes
Ingres yes no no no
MySQL no yes yes no
Oracle no yes no no
Postgres yes yes yes yes
SQL Server no no no no
SQLite no no no no
Sybase ASE no no no no
Sybase SQL Anywhere no no no no

Explanation:

  • The BETWEEN SYMMETRIC column indicates, whether the database supports the SYMMETRIC keyword in general
  • The RVE = RVE column indicates, whether the database supports row value expressions in general (e.g. in equal comparison predicates)
  • The RVE < RVE column indicates, whether the database supports “ordering” comparison predicates (<, <=, >, >=) along with row value expressions
  • The RVE BETWEEN column indicates, whether the database supports the BETWEEN predicates along with row value expressions

Footnotes:

  • [1]: CUBRID doesn’t really support row value expressions. What looks like a RVE is in fact a SET in CUBRID
  • [2]: H2 doesn’t really support row value expressions. What looks like a RVE is in fact an ARRAY in H2

ElSql, a new external SQL DSL for Java

Stephen Colebourne who is frequently commenting on the lambda-dev and other Java 8 mailing lists, has recently published an idea he has been having for a while: ElSql, a new external SQL DSL for Java.

An example SQL statement is given on the blog posts or on GitHub:

 @NAME(SelectBlogs)
   @PAGING(:paging_offset,:paging_fetch)
     SELECT @INCLUDE(CommonFields)
     FROM blogs
     WHERE id = :id
       @AND(:date)
         date > :date
       @AND(:active)
         active = :active
     ORDER BY title, author
 @NAME(CommonFields)
   title, author, content

As can be seen, this is almost mere string-based SQL, enhanced with some “hooks” for later use in Java. While Stephen’s idea here is to keep it simple (much simpler than jOOQ) it shows some useful applications of allowing to write external DSLs which resemble SQL even more than jOOQ.

DSL authoring support is becoming a more and more interesting topic in various platforms. Eclipse is developing Xtext, Scala is experimenting with even more powerful Macros. While Xtext doesn’t allow to mix the DSL grammar with Java, Scala’s Macros do precisely that. They allow for simple compile-time checked Scala syntax extension, if you have a powerful enough CPU to compile such source code. Where this could go for jOOQ was discussed on this blog before

For more details, read about ElSql on Stephen’s or the OpenGamma blog: