The Parameterless Generic Method Antipattern


A very interesting question was posted to Stack Overflow and reddit just recently about Java generics. Consider the following method:

<X extends CharSequence> X getCharSequence() {
    return (X) "hello";
}

While the unsafe cast seems a bit wonky, and you might guess there’s something wrong here, you can still go ahead and compile the following assignment in Java 8:

Integer x = getCharSequence();

This is obviously wrong, because Integer is final, and there is thus no possible Integer subtype that can also implement CharSequence. Yet, Java’s generic type system doesn’t care about classes being final final, and it thus infers the intersection type Integer & CharSequence for X prior to upcasting that type back to Integer. From a compiler perspective, all is fine. At runtime: ClassCastException

While the above seems “obviously fishy”, the real problem lies elsewhere.

It is (almost) never correct for a method to be generic on the return type only

There are exceptions to this rule. Those exceptions are methods like:

class Collections {
    public static <T> List<T> emptyList() { ... }
}

This method has no parameters, and yet it returns a generic List<T>. Why can it guarantee correctness, regardless of the concrete inference for <T>? Because of its semantics. Regardless if you’re looking for an empty List<String> or an empty List<Integer>, it is possible to provide the same implementation for any of these T, despite erasure, because of the emptiness (and immutable!) semantics.

Another exception is builders, such as javax.persistence.criteria.CriteriaBuilder.Coalesce<, which is created from a generic, parameterless method:

<T> Coalesce<T> coalesce();

Builder methods are methods that construct initially empty objects. Emptiness is key, here.

For most other methods, however, this is not true, including the above getCharSequence() method. The only guaranteed correct return value for this method is null

<X extends CharSequence> X getCharSequence() {
    return null;
}

… because in Java, null is the value that can be assigned (and cast) to any reference type. But that’s not the intention of the author of this method.

Think in terms of functional programming

Methods are functions (mostly), and as such, are expected not to have any side-effects. A parameterless function should always return the very same return value. Just like emptyList() does.

But in fact, these methods aren’t parameterless. They do have a type parameter <T>, or <X extendds CharSequence>. Again, because of generic type erasure, this parameter “doesn’t really count” in Java, because short of reification, it cannot be introspected from within the method / function.

So, remember this:

It is (almost) never correct for a method to be generic on the return type only

Most importantly, if your use-case is simply to avoid a pre-Java 5 cast, like:

Integer integer = (Integer) getCharSequence();

Want to find offending methods in your code?

I’m using Guava to scan the class path, you might use something else. This snippet will produce all the generic, parameterless methods on your class path:

import java.lang.reflect.Method;
import java.util.Comparator;
import java.util.stream.Stream;

import com.google.common.reflect.ClassPath;

public class Scanner {

    public static void main(String[] args) throws Exception {
        ClassPath
           .from(Thread.currentThread().getContextClassLoader())
           .getTopLevelClasses()
           .stream()
           .filter(info -> !info.getPackageName().startsWith("slick")
                        && !info.getPackageName().startsWith("scala"))
           .flatMap(info -> {
               try {
                   return Stream.of(info.load());
               }
               catch (Throwable ignore) {
                   return Stream.empty();
               }
           })
           .flatMap(c -> {
               try {
                   return Stream.of(c.getMethods());
               }
               catch (Throwable ignore) {
                   return Stream.<Method> of();
               }
           })
           .filter(m -> m.getTypeParameters().length > 0 && m.getParameterCount() == 0)
           .sorted(Comparator.comparing(Method::toString))
           .map(Method::toGenericString)
           .forEach(System.out::println);
    }
}

Would We Still Criticise Checked Exceptions, If Java had a Better try-catch Syntax?


In the context of a previous blog post about JUnit 5, Maaartinus, one of our readers, has brought up a very interesting idea:

The only problem with try-catch is its verbosity, which is something I can live with (IMHO a lone catch would do better, the implicit try would apply to all preceding code in the block; just syntactic sugar)

Huh!

Imagine a world where the following is valid Java code:

{
    something();
}
catch (Exception e) {
    /* All exceptions from the above block */
}

Likewise:

{
    something();
}
finally {
    /* Clean up after the previous block */
}

In other languages, this is implemented exactly as such. Take PL/SQL, for instance. An ordinary block looks like this:

BEGIN
  SOMETHING();
END;

Replace curly braces by BEGIN and END keywords, and you have exactly the same thing. Now, if SOMETHING raises an exception, in PL/SQL, we can append an EXCEPTION block, which does exactly the same thing as catch in Java:

BEGIN
  SOMETHING();
EXCEPTION
  WHEN OTHERS THEN NULL;
END;

Indeed, in these very trivial cases, the try keyword seems optional just like there is no such keyword in PL/SQL, and we don’t really need it as the scope of the catch and/or finally blocks is very well defined (at first sight, there might be caveats, of course).

So what? We’ve saved 3 characters…

In these trivial cases, we’re not gaining a lot from the “improved” syntax. But what about many other cases where the notoriously verbose try { ... } catch { ... } syntax might be getting on our nerves…? Again, in PL/SQL, whenever you’re using a block using BEGIN .. END, you can automatically profit from optionally adding an EXCEPTION block.

Without thinking this through thoroughly though (whew, some English language usage!), this could add immense syntactic value to Java. For instance:

lambdas

// Better:
Consumer<String> consumer = string -> {
    something();
}
catch (Exception e) {
    /* still part of the consumer */
}

// Instead of:
Consumer<String> consumer = string -> {
    try {
        something();
    }
    catch (Exception e) {
        /* still part of the consumer */
    }
}

Would that have prevented long discussions about checked exceptions in lambdas and in the Stream API

loops

// Better:
for (String string : strings) {
    something();
}
catch (Exception e) {
    /* still part of the loop's iteration */
}

// Instead of:
for (String string : strings) {
    try {
        something();
    }
    catch (Exception e) {
        /* still part of the loop's iteration */
    }
}

Again, tons of syntactic value here!

if / else

For consistency reasons, although this might appear a bit esoteric to people used to Java code. But let’s think out of the box, and admit the following!

// Better:
if (check) {
    something();
}
catch (Exception e) {
    /* still part of the if branch */
}
else {
    somethingElse();
}
catch (Exception e) {
    /* still part of the else branch */
}

// Instead of:
if (check) {
    try {
        something();
    }
    catch (Exception e) {
        /* still part of the if branch */
    }
}
else {
    try {
        something();
    }
    catch (Exception e) {
        /* still part of the else branch */
    }
}

Huh!

method bodies

Last but not least, method bodies would be the ultimate entities profiting from this additional syntax sugar. If you’re admitting that the curly braces in methods are nothing but mandatory blocks (or mandatory BEGIN .. END constructs), then you could have:

// Better:
public void method() {
    something();
}
catch (Exception e) {
    /* still part of the method body */
}

// Instead of:
public void method() {
    try {
        something();
    }
    catch (Exception e) {
        /* still part of the method body */
    }
}

This is particularly useful for (static) initialisers, where exception handling is always a pain, as there is no way to specify a throws clause in a (static) initialiser (might be a good opportunity to fix that!)

class Something {
    
    // Better:
    static {
        something();
    }
    catch (Exception e) {
        /* still part of the initialiser body */
    }

    // Instead of:
    static {
        try {
            something();
        }
        catch (Exception e) {
            /* still part of the initialiser body */
        }
    }
}

Take this one step further

Of course, we wouldn’t stop here. We’d also get rid of the very peculiar requirement of putting curly braces after catch (or finally). Once we have established the above, how about also allowing:

// Better:
something();
    catch (SQLException e)
        log.info(e);
    catch (IOException e)
        log.warn(e);
    finally
        close();

// Instead of:
try {
    something();
}
catch (SQLException e) {
    log.info(e);
}
catch (IOException e) {
    log.warn(e);
}
finally {
    close();
}

Now, make exception blocks expressions, rather than statements, and suddenly, Java starts to look an awful lot like all those cool languages. Like Scala. Or Kotlin.

Conclusion

Of course, the “old” syntax would still be possible. For instance, when using the try-with-resources statement, it is inevitable. But the big advantage of such syntax sugar is that in cases when we have to handle exceptions (namely checked exceptions), the pain would be lessened a bit, as we could do so without nesting blocks several levels deep. Perhaps, with this syntax, we would no longer criticise checked exceptions at all?

Very interesting ideas, thanks again, Maaartinus, for sharing.

What are your thoughts?

jOOQ 4.0’s New API Will Use Annotations Only for Truly Declarative Java/SQL Programming


SQL is the only really popular and mature 4GL (Fourth Generation Programming Language). I.e. it is the only popular declarative language.

At the same time, SQL has proven that turing completeness is not reserved to lesser languages like C, C++, or Java. Since SQL:1999 and its hierarchical common table expressions, SQL can be safely considered “turing complete”. This means that any program can be written in SQL. Don’t believe it? Take, for instance, this SQL Mandelbrot set calculation as can be seen in this Stack Overflow question.

mandelbrot set

Source: User Elie on http://stackoverflow.com/q/314864/521799

Wonderful! No more need for procedural, and object oriented cruft.

How we’ve been wrong so far…

At Data Geekery (the company behind jOOQ), we love SQL. And we love Java. But one thing has always bothered us in the past. Java is not really a purely declarative language. A lot of Java language constructs are real anti patterns for the enlightened declarative programmer. For instance:

// This is bad
for (String string : strings)
    System.out.println(string);

// This is even worse
try {
    someSQLStatements();
}
catch (SQLException e) {
    someRecovery();
}

The imperative style of the above code is hardly ever useful. Programmers need to tediously tell the Java compiler and the JVM what algorithm they meant to implement, down to the single statement, when in reality, using the JIT and other advanced optimisation techniques, they don’t really have to.

Luckily, there are annotations

Since Java 5, however, there have been farsighted people in expert groups who have added a powerful new concept to the Java language: Annotations (more info here). At first, experiments were made with only a handful of limited-use annotations, like:

  • @Override
  • @SuppressWarnings

But then, even more farsighted people have then proceeded in combining these annotations to form completely declaratively things like a component:

@Path("/MonsterRest")
@Stateless
@WebServlet(urlPatterns = "/MonsterServlet")
@Entity
@Table(name = "MonsterEntity")
@XmlRootElement
@XmlAccessorType(XmlAccessType.FIELD)
@NamedQuery(name = "findAll", query = "SELECT c FROM Book c")
public class Book extends HttpServlet {
 
    // ======================================
    // =             Attributes             =
    // ======================================
 
    @Id
    @GeneratedValue
    private Long id;
    private String isbn;
    private Integer nbOfPage;
    private Boolean illustrations;
    private String contentLanguage;
    @Column(nullable = false)
    @Size(min = 5, max = 50)
    @XmlElement(nillable = false)
    private String title;
    private Float price;
    @Column(length = 2000)
    @Size(max = 2000)
    private String description;
    @ElementCollection
    @CollectionTable(name = "tags")
    private List<String> tags = new ArrayList<>();

Look at this beauty. Credits to Antonio Goncalves

However, we still think that there is a lot of unnecessary object oriented bloat in the above. Luckily, recent innovations that make Java annotations turing complete (or even sentient?) will now finally allow us to improve upon this situation, specifically for jOOQ, which aims to model the declarative SQL language in Java. Finally, annotations are a perfect fit!

Those innovations are:

These innovations allow us to completely re-implement the entire jOOQ 4.0 API in order to allow for users writing SQL as follows:

@Select({
    @Column("FIRST_NAME"),
    @Column("LAST_NAME")
})
@From(
    table = @Table("AUTHOR"),
    join = @Join("BOOK"),
    predicate = @On(
        left = @Column("AUTHOR.ID"),
        op = @Eq,
        right = @Column("BOOK.AUTHOR_ID")
    )
)
@Where(
    predicate = @Predicate(
        left = @Column("BOOK.TITLE"),
        op = @Like,
        right = @Value("%Annotations in a Nutshell%")
    )
)
class SQLStatement {}

Just like JPA, this makes jOOQ now fully transparent and declarative, by using annotations. Developers will now be able to completely effortlessly translate their medium to highly complex SQL queries into the exact equivalent in jOOQ annotations.

Don’t worry, we’ll provide migration scripts to upgrade your legacy jOOQ 3.x application to 4.0. A working prototype is on the way and is expected to be released soon, early adopter feedback is very welcome, so stay tuned for more exciting SQL goodness!

Watch Out For Recursion in Java 8’s [Primitive]Stream.iterate()


An interesting question by Tagir Valeev on Stack Overflow has recently caught my attention. To keep things short (read the question for details), while the following code works:

public static Stream<Long> longs() {
    return Stream.iterate(1L, i ->
        1L + longs().skip(i - 1L)
                    .findFirst()
                    .get());
}

longs().limit(5).forEach(System.out::println);

printing

1
2
3
4
5

The following, similar code won’t work:

public static LongStream longs() {
    return LongStream.iterate(1L, i ->
        1L + longs().skip(i - 1L)
                    .findFirst()
                    .getAsLong());
}

Causing a StackOverflowError.

Sure, this kind of recursive iteration is not optimal. It wasn’t prior to Java 8 and it certainly isn’t with the new APIs either. But one might think it should at least work, right? The reason why it doesn’t work is because of a subtle implementation difference between the two iterate() methods in Java 8. While the reference type stream’s Iterator first returns the seed and only then proceeds with iterating by applying the iteration function on the previous value:

final Iterator<T> iterator = new Iterator<T>() {
    @SuppressWarnings("unchecked")
    T t = (T) Streams.NONE;

    @Override
    public boolean hasNext() {
        return true;
    }

    @Override
    public T next() {
        return t = (t == Streams.NONE) ? seed : f.apply(t);
    }
};

This is not the case for the LongStream.iterate() version (and other primitive streams):

final PrimitiveIterator.OfLong iterator = new PrimitiveIterator.OfLong() {
    long t = seed;

    @Override
    public boolean hasNext() {
        return true;
    }

    @Override
    public long nextLong() {
        long v = t;
        t = f.applyAsLong(t);
        return v;
    }
};

The iteration function is already pre-fetched one value in advance. This is usually not a problem, but can lead to

  1. Optimisation issues when the iteration function is expensive
  2. Infinite recursions when the iterator is used recursively

As a workaround, it might be best to simply avoid recursion with this method in primitive type streams. Luckily, a fix in JDK 9 is already on its way (as a side effect for a feature enhancement):
https://bugs.openjdk.java.net/browse/JDK-8072727

Improve Your JUnit Experience with this Annotation


JUnit is probably part of 90% of all Java projects. And the exciting thing is, we’ll soon have JUnit 5 with Java 8 support. We’ve blogged about an improvement recently.

Back in JUnit 4 land, there’s this little trick that I can only recommend you put in all of your unit tests. Just add this little annotation here and you’ll be much more happy:

@FixMethodOrder(MethodSorters.NAME_ASCENDING)
class MyTests {
    ...
}

What does it do? It’s simple. It fixes JUnit’s weird default of not defaulting to any testing order. Sure, not having any order in your tests might help accidentally discover some evil test inter-dependency. But usually, when you’re looking for individual tests and results, e.g. in your IDE, it’s just much much better to be able to visually scan the test suite and find the right method.

E.g. what do you prefer? This?

junit-better

Or this?

junit-worse

Exactly. Finally, a useful annotation. Just put the following everywhere and help make this a slightly better world:

@FixMethodOrder(MethodSorters.NAME_ASCENDING)
class MyTests {
    ...
}

Java A’s new Local-Variable Type Inference


News could hardly get more exciting than this, for a programming language aficionado!

There is now a JEP 286 for Local-Variable Type Inference with status “Candidate”. And a request for feedback by Brian Goetz, which I would love to invite you to participate in:
http://mail.openjdk.java.net/pipermail/platform-jep-discuss/2016-March/000037.html

Please do so, the survey remains open only from March 9 to March 16!

This is not a feature that will be implemented. It might be implemented. Hence, there is no specific Java version yet, which is why I name the Java version “A” (for Awesome).

What is local-variable type inference and why is it good?

Let’s have a look at a feature that various other languages have had for quite a while. In this blog post, I’d like to discuss the general idea, not the possibly specific implementation that might be planned for Java, as that would be too early, and I certainly don’t have the big picture of how this fits into Java.

In Java, as well as in some other languages, types are always declared explicitly and verbosely. For instance, you write things like:

// Java 5 and 6
List<String> list = new ArrayList<String>();

// Java 7
List<String> list = new ArrayList<>();

Notice how in Java 7, some syntax sugar was added via the useful diamond operator <>. It helps removing unnecessary redundancy in the Java way, i.e. by applying “target-typing”, which means the type is defined by the “target”. Possible targets are:

  • Local variable declarations
  • Method arguments (both from the outside and from the inside of the method)
  • Class members

Since in many cases, the target type MUST be declared explicitly (method arguments, class members), Java’s approach makes a lot of sense. In the case of local variables, however, the target type doesn’t really need to be declared. Since the type definition is bound to a very local scope, from which it cannot escape, it may well be inferred by the compiler without the source code ever being explicit about it, from the “source type”. This means, we will be able to do things like:

// Java A as suggested in the JEP

// infers ArrayList<String>
var list = new ArrayList<String>();

// infers Stream<String>
val stream = list.stream();

In the above example var stands for a mutable (non-final) local variable, whereas val stands for an immutable (final) local variable. Notice how the type of list was never really needed, just as when we write the following, where the type is already inferred today:

stream = new ArrayList<String>().stream();

This will work no different from lambda expressions, where we already have this kind of type inference in Java 8:

List<String> list = new ArrayList<>();

// infers String
list.forEach(s -> {
    System.out.println(s);
};

Think of lambda arguments as local variables. An alternative syntax for such a lambda expression might have been:

List<String> list = new ArrayList<>();

// infers String
list.forEach((val s) -> {
    System.out.println(s);
};

Other languages have this, but is it good?

Among these other languages: C# and Scala and JavaScript, if you will😉. YAGNI is probably an common reaction to this feature. For most people, it’s mere convenience to be able not to type all types all the time. Some people might prefer to see the type explicitly written down, when reading code. Especially, when you have a complex Java 8 Stream processing pipeline, it can get hard to track all the types that are inferred along the way. An example of this can be seen in our article about jOOλ’s window function support:

BigDecimal currentBalance = new BigDecimal("19985.81");
 
Seq.of(
    tuple(9997, "2014-03-18", new BigDecimal("99.17")),
    tuple(9981, "2014-03-16", new BigDecimal("71.44")),
    tuple(9979, "2014-03-16", new BigDecimal("-94.60")),
    tuple(9977, "2014-03-16", new BigDecimal("-6.96")),
    tuple(9971, "2014-03-15", new BigDecimal("-65.95")))
.window(Comparator
    .comparing((Tuple3<Integer, String, BigDecimal> t) 
        -> t.v1, reverseOrder())
    .thenComparing(t -> t.v2), Long.MIN_VALUE, -1)
.map(w -> w.value().concat(
     currentBalance.subtract(w.sum(t -> t.v3)
                              .orElse(BigDecimal.ZERO))
));

The above implements a running total calculation that yields:

+------+------------+--------+----------+
|   v0 | v1         |     v2 |       v3 |
+------+------------+--------+----------+
| 9997 | 2014-03-18 |  99.17 | 19985.81 |
| 9981 | 2014-03-16 |  71.44 | 19886.64 |
| 9979 | 2014-03-16 | -94.60 | 19815.20 |
| 9977 | 2014-03-16 |  -6.96 | 19909.80 |
| 9971 | 2014-03-15 | -65.95 | 19916.76 |
+------+------------+--------+----------+

While the Tuple3 type needs to be declared because of the existing Java 8’s limited type inference capabilities (see also this article on generalized target type inference), are you able to track all the other types? Can you easily predict the result? Some people prefer the short style, others claim:

On the other hand, do you like to manually write down a type like Tuple3<Integer, String, BigDecimal>? Or, when working with jOOQ, which of the following versions of the same code do you prefer?

// Explicit typing
// ----------------------------------------
for (Record3<String, Integer, Date> record : ctx
    .select(BOOK.TITLE, BOOK.ID, BOOK.MODIFIED_AT)
    .from(BOOK)
    .where(TITLE.like("A%"))
) {
    // Do things with record
    String title = record.value1();
}

// "Don't care" typing
// ----------------------------------------
for (Record record : ctx
    .select(BOOK.TITLE, BOOK.ID, BOOK.MODIFIED_AT)
    .from(BOOK)
    .where(TITLE.like("A%"))
) {
    // Do things with record
    String title = record.getValue(0, String.class);
}

// Implicit typing
// ----------------------------------------
for (val record : ctx
    .select(BOOK.TITLE, BOOK.ID, BOOK.MODIFIED_AT)
    .from(BOOK)
    .where(TITLE.like("A%"))
) {
    // Do things with record
    String title = record.value1();
}

I’m sure that few of you would really like to explicitly write down the whole generic type, but if your compiler can still remember the thing, that would be awesome, wouldn’t it? And it’s an opt-in feature. You can always revert to explicit type declarations.

Edge-cases with use-site variance

There are some things that are not possible without this kind of type inference, and they’re related to use-site variance and the specifics of generics as implemented in Java. With use-site variance and wild cards, it is possible to construct “dangerous” types that cannot be assigned to anything because they’re undecidable. For details, please read Ross Tate’s paper on Taming Wildcards in Java’s Type System.

Use-site variance is also a pain when exposed from method return types, as can be seen in some libraries that either:

  • Didn’t care about this pain they’re inflicting on their users
  • Didn’t find a better solution as Java doesn’t have declaration-site variance
  • Were oblivious to this issue

An example:

interface Node {
    void add(List<? extends Node> children);
    List<? extends Node> children();
}

Imagine a tree data structure library, where tree nodes return lists of their children. A technically correct children type would be List<? extends Node> because the children are Node subtypes, and it is perfectly OK to use a Node subtype list.

Accepting this type in the add() method is great from an API design perspective. It allows people to add a List<LeafNode>, for instance. Returning it from children() is horrible, though, because the only options are now:

// Raw type. meh
List children = parent.children();

// Wild card. meh
List<?> children = parent.children();

// Full type declaration. Yuk
List<? extends Node> children = parent.children();

With JEP 286, we might be able to work around all of this and have this nice fourth option:

// Awesome. The compiler knows it's 
// List<? extends Node>
val children = parent.children();

Conclusion

Local Variable Type Inference is a hot topic. It’s entirely optional, we don’t need it. But it makes a lot of things much much easier, especially when working with tons of generics. We’ve seen that type inference is a killer feature when working with lambda expressions and complex Java 8 Stream transformations. Sure, it will be harder to track all the types across a long statement, but at the same time, if those types were spelled out, it would make the statement very unreadable (and often also very hard to write).

Type inference helps make developers more productive without giving up on type safety. It actually encourages type safety, because API designers are now less reluctant to expose complex generic types to their users, as users can use these types more easily (see again the jOOQ example).

In fact, this feature is already present in Java in various situations, just not when assigning a value to a local variable, giving it a name.

Whatever your opinion is: Do make sure to share it to the community and answer this survey:
http://mail.openjdk.java.net/pipermail/platform-jep-discuss/2016-March/000037.html

Looking forward to Java A where A stands for Awesome.

UI Developers! Choose Sensible Default Ordering!


Good decisions come from experience. Experience comes from making bad decisions.

― Mark Twain

Today, let’s look at one piece of experience and how we can turn that into good decisions when implementing UI logic. Please, all UI developers read this.

The bad decision

When UI developers display tabular data, it is very common for the table to offer sorting on each column. This is extremely useful, as it helps the user extract basic insight from the data by just performing a single click. Let’s look at an example where the choice of sorting default seemed to be correct at first, but was wrong later on:

Bing Webmaster tools for the jOOQ blog. When I reach the page traffic website to see the Bing traffic for the last month, I can see this:

bing-default

The default table ordering is applied to one of the obvious columns: “Appeared in search”, descendingly. Personally, I might have preferred it to be applied to “Clicks from search” per default, but what’s important: The column is ordered descendingly. I really only care about our top 5 best blog posts. Not about the worst.

So, this is good. Let’s see what happens if I do click on “Clicks from search”, however:

bing-worst

I get an overview of our worst-performing blog posts for that week. Yes, there are some posts that don’t attract any audience for an entire month from Bing. Bummer. (Let’s blame it on Bing’s popularity, not on our blog’s). But that’s not what I cared about. I wanted to see the inverse: The top performing blog posts. In order to see those, I have to click again on the column, to invert the sort order.

The experience

I see this behaviour all the time. UI developers mindlessly defaulting to the technical natural sort order on UI table widgets. In many cases, as a user, this is a frustrating experience, because:

  1. Heck, what did I do? Why is it displaying this data
  2. Aaah, it is sorting ascendingly
  3. Crap, I have to click again

The cognitive dissoncance between steps 1) and 2) shouldn’t be underestimated. Depending on the complexity of the task, or the data that is being displayed, a user might first be confused before they realise that the wanted behaviour is 2 clicks away, not 1. While it should be a technical detail that there are things like ascending and descending orderings, in UIs there is a third notion: That of natural ordering.

Why do developers get this wrong so often? Simple! Because there is also a technical natural ordering, and that’s almost always the ascending order. For instance, in Java, when you do this:

TreeSet<Integer> set = new TreeSet<>();
set.add(1);
set.add(12);
set.add(3);
System.out.println(set);

You’ll get the nicely sorted data as such:

[1, 3, 12]

The technical natural ordering depends on the “raw” data type only. In the case of Integer, this is simply the natural ascending integer number ordering.

The UI natural ordering, however, depends on the context of a data type. While a meaningless integer might still be sorted ascendingly, the previous count value (also an integer!) should be sorted descendingly by default.

The good decision

So, are there any rules with respect to the UI natural ordering? Intuition, yes! But also the following more concrete (and far from exhaustive) list of hints:

Data types and contexts in favour of ascending natural ordering

  • names: All sorts of names of things like people, cities, countries, etc. should be ordered ascendingly in their alphabetical (case-insensitive) order. That is how people skim phone books, that’s how they expect names to be ordered
  • phone numbers: (and other similar numbers) should be ordered like integers: ascendingly. But beware of their specific formatting. It is very likely that the special characters in US formats (like (555) 123-4567) shouldn’t matter when comparing numbers (e.g. with +1-800-1112222), or with numbers from other countries
  • row numbers: This is an obvious candidate for ascending ordering because the row number itself already yields an implicit order, by which it was calculated (see also our article on SQL ROW_NUMBER())
  • dates in the future: If you know your dates are in the future, then you should order them ascendingly, as users want to see the closest date first. Think of a calendar, for instance. Do you really want to display a date in the year 8375, just because you happen to celebrate your 6394th birthday? Probably not. But if in doubt, with dates, better sort them descendingly (see below), as you usually have most dates in the past and only few dates in the future.
  • aggregations: There are few aggregations that should be sorted ascendingly by default. One of them is SQL’s MIN() aggregation. If you’re really looking for the lowest value, the lowest value should be on top, followed by higher values. Other aggregations that are OK to default to ascending order are percentiles (e.g. the MEDIAN()), or standard deviations, or linear regression functions, because it is not clear whether the user cares about the highest or the lowest value. In this case, it is OK to simply default to the technical natural ordering. Most other aggregations, however, should be sorted descendingly (see below)
  • members of a period: A period is something like a year, month, week, day. Periods come with a finite, discrete number of members, such as day of year, week of year, month (for year), day of month (for month), weekday (for week), hour, minute, second (for day). The default ordering here is obvious: always ascendingly, in order of period traversal
  • money (price): No one wants to buy the most expensive flight! Obvious, right? But be careful. Prices are expressed in money, and money isn’t always best sorted ascendingly. If in doubt, order money owed to someone ascendingly, and money owned descendingly. What a difference a little "n" makes!

Data types and contexts in favour of descending natural ordering

  • dates: This is a tricky one, but there are few occasions where dates really should be sorted ascendingly, so default to sorting them descendingly, if most dates lie in he past. The reason for this rationale is the fact that users want to see the closest date first, e.g. the most recent date of a bank account transaction.
  • aggregations: When you run SQL COUNT(*) or SUM() or AVG() or MAX() aggregations, users will really care about the highest values only, as we saw in our Bing Webmaster tools example. Please do sort these aggregations descendingly by default!
  • changes: If the change between your current value and e.g. last week’s value is the thing of interest (e.g. stock market, or again Bing Webmaster tools), then both orderings are interesting. The biggest winners / losers are both useful pieces of information. However, let’s stay positive here and order stuff descendingly by default in order to display the biggest winners first. We don’t want to be negative by default. Whether the change in percent or the absolute change is of interest is another story and depends on the domain.
  • file sizes: Probably, the user is looking for the biggest files – e.g. to see what to delete to save most disk space. Order descendingly by default. If in doubt, think of sizes as the COUNT(*) value of any content. And that should clearly be ordered descendingly.
  • booleans: When ordering a column that contains true/false information (e.g. E-Mail does or does not have any attachments), then the true information is usually more interesting. Since true = 1 and false = 0, order these columns descendingly by default.
  • money (balance): Unlike prices (money owed to someone), balances (money owned) should be ordered descendingly. We want to know how many billions we have. No one cares about their worst assets.

Data types without ordering

There are some data types that simply shouldn’t be ordered. Don’t offer ordering by default on them, it might confuse the user and it might kill your server! These include:

  • URLs: In the case of Bing Webmaster tools, there is really no point in ordering URLs. I mean, the natural order would be http vs. https first, then the domain (but not from top level domain down to the irrelevant sharding identifier), then possibly the port (completely useless piece of information for the user), then the path (probably ordered by date in blogs, but pretty random otherwise). Ordering by URLs doesn’t add value, so don’t offer it. Caveat: If you display only a URL part (e.g. the domain name), ordering might make sense.
  • text: Now, plain text (e.g. E-Mail content) is really the very last thing you want to order. Most SQL databases don’t even allow for ordering CLOB content. This should be obvious, just don’t do it.
  • composite data: If data points are structured (like age and sex in one data point, in case the combination matters for your domain), they’re very hard to order correctly. Specifically, sex doesn’t have any non-technical order. If in doubt, better don’t offer ordering, or decompose the data point.

Data types where sorting challenges mean that tables are the wrong tool

Some data types are tricky to sort by default. Mostly, this is because we’re dealing with discrete or continuous values that go in both directions of a “zero” value. E.g. numbers, percentages, dates (where zero=today):

  • dates: As we’ve seen above, dates are a bit of a tricky data type to sort in tables, as the user experience depends on whether dates are mostly in the past (like bank account transactions) or mostly in the future (like appointments), or both (like calendar entries). A much better UI widget to display timelines that expand both into the past and the future are .. well .. timelines, which cannot be sorted by the user. They’re always ordered by date, displaying today’s date by default
  • percentages: If percentages are the most interesting data point in a data set (e.g. stock option changes), then chances are, that the value 0.00% is the center of your data, e.g. in a winners/losers display widget. While they’re the center of your data, they’re not the center of interest. The most interesting values will still be the top winners and the top losers. This is hard to display with sorting only. Filtering (or pagination) will need to apply in order to remove the stocks that are in the middle
  • (approximate) search results: You don’t see any means of ordering Google search results, right? That’s because Google searches are approximate, i.e. their results are already ordered in terms of relevance. You usually don’t want to offer your users to re-order these results (at least not on the relevance scale). One exception might be ordering of exact search results by date (or something else), but this is really hard to get right from a UX perspective, as you risk displaying lots of irrelevant results based on their freshness.

Situations where the above is not true

Now, the above are useful advice for making the right decision in the case of simple and homogeneous tables, like the one exposed in Bing’s Webmaster Tools (all columns are either unsortable (URL) or aggregations). If you display arbitrary data, then it might not be wise to apply these rules as it will confuse the user if one column defaults to descending ordering, and another defaults to ascending order. In that case, revert to sorting all columns ascendingly. The user will understand.

Conclusion

If you’re a UI developer, make natural ordering flags first class citizens of your software design. Pretty much every data type ships with an intuitive, and obvious value for default ordering, i.e. one of:

  • Ascending
  • Descending
  • No ordering

Every time you design a table, please do think of the above. It’s that little extra effort that will make your user interface much more meaningful. And, beware. This is really what you as a UI developer need to do. The backend developers operating on the database cannot specify this, because:

  • Databases contain raw, context-free data (e.g. of type NUMBER or VARCHAR)
  • UI ordering is not necessarily the same as SQL ordering