Yak Shaving is a Good Way to Improve an API

Yak Shaving (uncountable):

  1. (idiomatic) Any apparently useless activity which, by allowing you to overcome intermediate difficulties, allows you to solve a larger problem.
  2. (idiomatic) A less useful activity done to consciously or unconsciously procrastinate about a larger but more useful task.

Both interpretations of the term Yak Shaving as explained by Wiktionary are absolutely accurate descriptions of most refactoring jobs. The Yak Shaving in refactoring itself can be described by this gif showing what happens when you want to change a light bulb:

light-bulb

However, when developing an API, it’s not such a bad idea to perform actual Yak Shaving (only the first interpretation, of course). Let’s look at an example why, from the daily work maintaining jOOQ.

The Task

For jOOQ 3.6, I wanted to implement a very simple feature. Feature #2639: Add stored procedure OUT values to DEBUG log output. This is not an important feature at all, but certainly very useful to a lot of jOOQ users. The idea is that every time you run a stored procedure with DEBUG logging activated, you’ll get the OUT parameters logged along with the procedure call. Here’s a visualisation:

debug-log-2

Now, the actual implementation would have been very easy. Just about 10 lines of code in the existing LoggerListener that already takes care of logging all the other things. But there were a couple of caveats, which reminded me of the above lightbulb changing gif:

The apparently useless activities

  1. There was no way to access the RETURN_VALUE meta information of a jOOQ Routine
  2. There was no easy way to access Routine IN and OUT values generically
  3. There was lifecycle event that modelled the moment when OUT parameters are fetched in jOOQ
  4. There was no way to format Routine OUT parameters in a nice way

Does this feel familiar? There is need for refactoring!

Now, this whole implementation is hidden in jOOQ’s internals. It wouldn’t matter too much for users, if this had been hacked together in one way or another. For instance, obviously the RETURN_VALUE meta information could be accessed through internal refactorings, the same is true for IN and OUT values. There are other lifecycle events that might have worked just as well, and formatting is easy to re-implement.

But this is a popular API that is used by many users who might profit from a cleaner solution. Thus, why don’t we simply refactor and implement:

  1. Add a public Routine.getReturnParameter() method
  2. Add public Routine.getValue() and setValue() methods
  3. Add ExecuteListener.outStart(ExecuteContext) and outEnd(ExecuteContext) to capture fetching of Routine OUT parameters
  4. Add Routine.outRecord() and Routine.inRecord() to view a Routine as a Record”

The thing is:

The API implementor is the first API consumer

It’s hard to foresee what API users really want. But if you’re implementing an API (or just a feature), and you discover that something is missing, always consider adding that missing thing to the public API. If it could be useful to yourself, internally, it could be even more useful to many others. This way, you turn one little nice feature into 5, amplifying the user love.

Don’t get me wrong. This doesn’t mean that every little piece of functionality needs to be exposed publicly, au contraire. But the fact that something is keeping you – as the maintainer from writing clean code might indicate that others implement the same hacky workarounds as you. And they won’t ask you explicitly for it!

Don’t believe it? Here’s an entirely subjective analysis of user feedback:

  • 0.2% – Hey, this is a cool product, I want to help the owner make it better, I’ll provide a very descriptive, constructive feature request and engage for the next 5 weeks to help implement it.
  • 0.8% – Whatever dudes. Make this work. Please.
  • 1.3% – Whatever dudes. Make this work. ASAP!
  • 4.0% – WTF is wrong with you guys? Didn’t you at least think about this once??
  • 4.7% – OK, I’m going to write this completely uninformed rant about this product now, which I hate so much. It makes my life completely miserable
  • 9.0% – Oh well, this doesn’t work. Let’s go home, it’s 17:00 anyways
  • 80.0% – Oh well, this didn’t work yesterday already. Let’s go home. It’s Friday, 16:00 anyways

Now, most of this list wasn’t meant entirely seriously, but you get the point. There may be those 0.2% of users / customers that love you and that actively engage with you. Others may still love you or at least like you, but they won’t engage. You have to guesstimate what they need.

So. Bottom line:

If you need it, they probably need it. Start Yak Shaving!

The Java Legacy is Constantly Growing

I’ve recently stumbled upon a very interesting caveat of the JDK APIs, the Class.getConstructors() method. Its method signature is this:

Constructor<?>[] getConstructors()

The interesting thing here is that Class.getConstructor(Class...) returns a Constructor<T>, with <T> being maintained:

Constructor<T> getConstructor(Class<?>... parameterTypes)

Why is there a difference, i.e. why doesn’t the first method return Constructor<T>[]?

Let’s consider the Javadoc:

Note that while this method returns an array of Constructor<T> objects (that is an array of constructors from this class), the return type of this method is Constructor<?>[] and not Constructor<T>[] as might be expected. This less informative return type is necessary since after being returned from this method, the array could be modified to hold Constructor objects for different classes, which would violate the type guarantees of Constructor<T>[].

59539500

That’s a tough one. Historically, here’s how this happened:

Java 1.0 / Oak: Arrays

In Java 1.0 (the immediate successor of the Oak programming language), arrays were already introduced. In fact, they have been introduced before the collections API, which was introduced in Java 1.2. Arrays suffer from all the problems that we know today, including them being covariant, which leads to a lot of problems at runtime, that cannot be checked at compile time:

Object[] objects = new String[1];
objects[0] = Integer.valueOf(1); // Ouch

Java 1.1: Reflection API

Short of a “decent” collections API, the only possible return type of the Class.getConstructors() method was Constructor[]. A reasonable decision at the time. Of course, you could do the same mistake as above:

Object[] objects = String.class.getConstructors();
objects[0] = Integer.valueOf(1); // Ouch

but in the addition to the above, you could also, rightfully, write this:

Constructor[] constructors  = String.class.getConstructors();
constructors[0] = Object.class.getConstructor();

// Muahahahahahahaha

Java 1.2: Collections API

Java has been backwards-compatible from the very early days, even from Oak onwards. There’s a very interesting piece of historic research about some of Oak’s backwards-compatibility having leaked into Java to this date in this Stack Overflow question.

While it would have been natural to design the reflection API using collections, now, it was already too late. A better solution might’ve been:

List getConstructors()

However, note that we didn’t have generics yet, so the array actually conveys more type information than the collection.

Java 1.5: Generics

In Java 5, the change from

Constructor[] getConstructors()

to

Constructor<?>[] getConstructors()

has been made for the reasons mentioned above. Now, the alternative API using a collection would definitely have been better:

List<Constructor<T>> getConstructors()

But the ship has sailed.

Java, the ugly wart

Java is full of these little caveats. They’re all documented in the Javadocs, and often on Stack Overflow. Just yesterday, we’ve documented a new caveat related to completely new API in Map and ConcurrentHashMap.

“Stewardship: the Sobering Parts,” a very good talk about all those caveats and how hard it is to maintain them by Brian Goetz can be seen here:

The summary of the talk:

When language designers talk about the language they're designing
When language designers talk about the language they’re designing

Avoid Recursion in ConcurrentHashMap.computeIfAbsent()

Sometimes we give terrible advice. Like in that article about how to use Java 8 for a cached, functional approach to calculating fibonacci numbers. As Matthias, one of our readers, noticed in the comments, the proposed algorithm may just never halt. Consider the following program:

public class Test {
    static Map<Integer, Integer> cache 
        = new ConcurrentHashMap<>();
 
    public static void main(String[] args) {
        System.out.println(
            "f(" + 25 + ") = " + fibonacci(25));
    }
 
    static int fibonacci(int i) {
        if (i == 0)
            return i;
 
        if (i == 1)
            return 1;
 
        return cache.computeIfAbsent(i, (key) -> {
            System.out.println(
                "Slow calculation of " + key);
 
            return fibonacci(i - 2) + fibonacci(i - 1);
        });
    }
}

It will run indefinitely at least on the following Java version:

C:\Users\Lukas>java -version
java version "1.8.0_40-ea"
Java(TM) SE Runtime Environment (build 1.8.0_40-ea-b23)
Java HotSpot(TM) 64-Bit Server VM (build 25.40-b25, mixed mode)

This is of course a “feature”. The ConcurrentHashMap.computeIfAbsent() Javadoc reads:

If the specified key is not already associated with a value, attempts to compute its value using the given mapping function and enters it into this map unless null. The entire method invocation is performed atomically, so the function is applied at most once per key. Some attempted update operations on this map by other threads may be blocked while computation is in progress, so the computation should be short and simple, and must not attempt to update any other mappings of this map.

The “must not” wording is a clear contract, which my algorithm violated, although not for the same concurrency reasons.

The Javadoc also reads:

Throws:

IllegalStateException – if the computation detectably attempts a recursive update to this map that would otherwise never complete

But that exception isn’t thrown. Neither is there any ConcurrentModificationException. Instead, the program just never halts.

The simplest use-site solution for this concrete problem would be to not use a ConcurrentHashMap, but just a HashMap instead:

static Map<Integer, Integer> cache = new HashMap<>();

Subtypes overriding super type contracts

The HashMap.computeIfAbsent() or Map.computeIfAbsent() Javadoc don’t forbid such recursive computation, which is of course ridiculous as the type of the cache is Map<Integer, Integer>, not ConcurrentHashMap<Integer, Integer>. It is very dangerous for subtypes to drastically re-define super type contracts (Set vs. SortedSet is greeting). It should thus be forbidden also in super types, to perform such recursion.

Further reference

While the contract issues are a matter of perception, the halting problem clearly is a bug. I’ve also documented this issue on Stack Overflow where Ben Manes gave an interesting answer leading to a previous (unresolved as of early 2015) bug report:

https://bugs.openjdk.java.net/browse/JDK-8062841

My own report (probably a duplicate of the above) was also accepted quickly, as:

https://bugs.openjdk.java.net/browse/JDK-8074374

While this is being looked at by Oracle, remember to:

Never recurse inside a ConcurrentHashMap.computeIfAbsent() method. And if you’re implementing collections and think it’s a good idea to write a possibly infinite loop, think again, and read our article:

Infinite Loops. Or: Anything that Can Possibly Go Wrong, Does)

Murphy is always right.

jOOQ – Ein alternativer Weg mit Java und SQL zu arbeiten

We’ve published an article in the German magazine www.java-aktuell.de, which is published by the iJUG e.V..

You can read and download the article free of charge from our blog!

In Java gibt es kein Standard-API, das die Ausdrucksstärke und Mächtigkeit von SQL direkt unterstützt. Alle Aufmerksamkeit ist auf objekt-relationales Mapping und andere höhere Abstraktionslevel gerichtet, beispielsweise OQL, HQL, JPQL, CriteriaQuery. jOOQ ist ein dual-lizenziertes Open-Source-Produkt, das diese Lücke füllt. Es implementiert SQL als typsichere domänen-spezifische Sprache direkt in Java und ist eine gute Wahl für Java-Applikationen, in denen SQL und herstellerspezifische Datenbankfunktionalität wichtig sind. Es zeigt, wie eine moderne domänenspezifische Sprache die Entwicklerproduktivität stark erhöhen kann, indem SQL direkt in Java eingebettet ist.