Java 8’s Method References Put Further Restrictions on Overloading


Method overloading has always been a topic with mixed feelings. We’ve blogged about it and the caveats that it introduces a couple of times:

There are two main reasons why overloading is useful:

  1. To allow for defaulted arguments
  2. To allow for disjunct argument type alternatives

Bot reasons are motivated simply to provide convenience for API consumers. Good examples are easy to find in the JDK:

Defaulted arguments

public class Integer {
    public static int parseInt(String s) {
        return parseInt(s,10);
    }

    public static int parseInt(String s, int radix) {}
}

In the above example, the first parseInt() method is simply a convenience method for calling the second one with the most commonly used radix.

Disjunct argument type alternatives

Sometimes, similar behaviour can be achieved using different types of parameters, which mean similar things but which are not compatible in Java’s type system. For example when constructing a String:

public class String {
    public static String valueOf(char c) {
        char data[] = {c};
        return new String(data, true);
    }

    public static String valueOf(boolean b) {
        return b ? "true" : "false";
    }

    // and many more...
}

As you can see, the behaviour of the same method is optimised depending on the argument type. This does not affect the “feel” of the method when reading or writing source code as the semantics of the two valueOf() methods is the same.

Another use-case for this technique is when commonly used, similar but incompatible types need convenient conversion between each other. As an API designer, you don’t want to make your API consumer goof around with such tedious conversions. Instead, you offer:

public class IOUtils {
    public static void copy(InputStream input, OutputStream output);
    public static void copy(InputStream input, Writer output);
    public static void copy(InputStream input, Writer output, String encoding);
    public static void copy(InputStream input, Writer output, Charset encoding);
}

This is a nice example showing both defaulted parameters (optional encoding) as well as argument type alternatives (OutputStream vs. Writer or String vs. Charset encoding representation.

Side-note

I suspect that the union type and defaulted argument ships have sailed for Java a long time ago – while union types might be implemented as syntax sugar, defaulted arguments would be a beast to introduce into the JVM as it would depend on the JVM’s missing support for named arguments.

As displayed by the Ceylon language, these two features cover about 99% of all method overloading use-cases, which is why Ceylon can do completely without overloading – on top of the JVM!

Overloading is dangerous and unnececssary

The above examples show that overloading is essentially just a means to help humans interact with an API. For the runtime, there is no such thing as overloading. There are only different, unique method signatures to which calls are linked “statically” in byte code (give or take more recent opcodes like invokedynamic). But the point is, there’s no difference for the computer if the above methods are all called copy(), or if they had been called unambiguously m1(), m2(), m3(), and m4().

On the other hand, overloading is real in Java source code, and the compiler has to do a lot of work to find the most specific method, and otherwise apply the JLS’s complex overload resolution algorithm. Things get worse with each new Java language version. In Java 8, for instance, method references will add additional pain to API consumers, and require additional care from API designers. Consider the following example by Josh Bloch:

You can copy-paste the above code into Eclipse to verify the compilation error (note that not-up-to-date compilers may report type inference side-effects instead of the actual error). The compilation error reported by Eclipse for the following simplification:

static void pfc(List<Integer> x) {
    Stream<?> s = x.stream().map(Integer::toString);
}

… is

Ambiguous method reference: both toString() and 
toString(int) from the type Integer are eligible

Oops!

The above expression is ambiguous. It can mean any of the following two expressions:

// Instance method:
x.stream().map(i -> i.toString());

// Static method:
x.stream().map(i -> Integer.toString(i));

As can be seen, the ambiguity is immediately resolved by using lambda expressions rather than method references. Another way to resolve this ambiguity (towards the instance method) would be to use the super-type declaration of toString() instead, which is no longer ambiguous:

// Instance method:
x.stream().map(Object::toString);

Conclusion

The conclusion here for API designers is very clear:

Method overloading has become an even more dangerous tool for API designers since Java 8

While the above isn’t really “severe”, API consumers will waste a lot of time overcoming this cognitive friction when their compilers reject seemingly correct code. One big faux-pas that is a takeaway from this example is to:

Never mix similar instance and static method overloads

And in fact, this amplifies when your static method overload overloads a name from java.lang.Object, as we’ve explained in a previous blog post.

There’s a simple reason for the above rule. Because there are only two valid reasons for overloading (defaulted parameters and incompatible parameter alternatives), there is no point in providing a static overload for a method in the same class. A much better design (as exposed by the JDK) is to have “companion classes” – similar to Scala’s companion objects. For instance:

// Instance logic
public interface Collection<E> {}
public class Object {}

// Utilities
public class Collections {}
public final class Objects {}

By changing the namespace for methods, overloading has been circumvented somewhat elegantly, and the previous problems would not have appeared.

TL;DR: Avoid overloading unless the added convenience really adds value!

NULL is Not The Billion Dollar Mistake. A Counter-Rant


A short while ago, I gave this answer on Quora. The question was “What is the significance of NULL in SQL?” and most of the existing answers went on about citing C.J. Date or Tony Hoare and unanimously declared NULL as “evil”.

So, everyone rants about NULL all the time. Let me counter-rant.

Academics

Of course, academics like C.J. Date will rant about NULL (see Greg Kemnitz’s interesting answer on Quora). Let me remind you that C.J. Date also ranted about UNION ALL, as pure relational theory operates only on sets, not on bags (like SQL does). While in theory, sets are probably much purer than bags, in practice, bags are just very useful.

These people probably also still mourn over the fact that SQL (useful) won over QUEL (pure), and I don’t blame them. Theory is always more beautiful than the real world, which is exposed to real world requirements.

Purists

There are also other kinds of purists who will run about and educate everyone about their black/white opinions that leave no room to “it depends…” pragmatic approaches. I like to display this witty comic strip for such occasions: New intern knows best: GOTO. Purists like extreme abstraction when they describe their world, and such abstraction asks for very simple models, no complexity. NULL adds tremendous complexity to the SQL “model”, and does thus not fit their view.

Fact is: It depends

The only factual opinion ever is one where there’s no clear opinion. NULL is an incredibly useful value, and some representation of NULL is inevitable in all languages / models that want to model cardinalities of the form:

  • 0 or 1 (here’s where NULL is useful)
  • exactly 1 (here, you don’t need NULL)
  • 0 .. many (here, you don’t need NULL)

Functional programming languages like to make use of the Optional “monad” (see Mario Fusco’s excellent explanation of what a monad is) to model the 0 or 1 cardinality, but that’s just another way of modelling NULL. The (possibly) absent value. Perhaps, if you like to discuss style (then you should read this), NULL vs. Optional may matter to you, but they’re really exactly the same thing. We’ve just been shifting whitespace and curly braces.

The only way to really do without the absent value would be to disallow the optional cardinality and use 0 .. many instead, which would be much less descriptive.

So, regardless of what purists or academics say about a perfect world, we engineers need potent tools that help us get our work done, and NULL (or “Optional” is one of these potent tools that allow us to do so.

Caveat: SQL NULL is not an absent value

Now, the caveat with SQL’s NULL is that it doesn’t behave like an absent value. It is the UNKNOWN value as others have also explained. This subtle difference has severe impact on a variety of operations and predicates, which do not behave very intuitively if you’re not aware of this distinction. Some examples (and there are many many more):

Even with this specification of SQL NULL being UNKNOWN, most people abuse SQL NULL to model the absent value instead, which works just nicely in most cases until you run into a caveat. It turns out that the UNKNOWN value is even more useful than the absent value, as it allows for modelling things with even more descriptiveness. One might think that having two “special” values would solve problems, like JavaScript, which distinguishes between null (UNKNOWN) and undefined (absent).

JavaScript itself is a beacon of usefulness that is inversely proportional to its purity or beauty, so long story short:

Pick your favourite spot on the useful <-> pure scale

Programming, languages, data models are always a tradeoff between purity and usefulness. Pick your favourite spot on that scale, but stop ranting about NULL being evil. Or as Simon Peyton Jones said:

Haskell is useless

What the sun.misc.Unsafe Misery Teaches Us


Oracle will remove the internal sun.misc.Unsafe class in Java 9. While most people are probably rather indifferent regarding this change, some other people – mostly library developers – are not. There had been a couple of recent articles in the blogosphere painting a dark picture of what this change will imply:

Maintaining a public API is extremely difficult, especially when the API is as popular as that of the JDK. There is simply (almost) no way to keep people from shooting themselves in the foot. Oracle (and previously Sun) have always declared the sun.* packages as internal and not to be used. Citing from the page called “Why Developers Should Not Write Programs That Call ‘sun’ Packages”:

The sun.* packages are not part of the supported, public interface.

A Java program that directly calls into sun.* packages is not guaranteed to work on all Java-compatible platforms. In fact, such a program is not guaranteed to work even in future versions on the same platform.

This disclaimer is just one out of many similar disclaimers and warnings. Whoever goes ahead and uses Unsafe does so … “unsafely“.

What do we learn from this?

The concrete solution to solving this misery is being discussed and still open. A good idea would be to provide a formal and public replacement before removing Unsafe, in order to allow for migration paths of the offending libraries.

But there’s a more important message to all of this. The message is:

When all you have is a hammer, every problem looks like a thumb

Translated to this situation: The hammer is Unsafe and given that it’s a very poor hammer, but the only option, well, library developers might just not have had much of a choice. They’re not really to blame. In fact, they took a gamble in one of the world’s most stable and backwards compatible software environments (= Java) and they fared extremely well for more than 10 years. Would you have made a different choice in a similar situation? Or, let me ask differently. Was betting on AWT or Swing a much safer choice at the time?

If something can somehow be used by someone, then it will be, no matter how obviously they’re gonna shoot themselves in the foot. The only way to currently write a library / API and really prevent users from accessing internals is to put everything in a single package and make everything package-private. This is what we’ve been doing in jOOQ from the beginning, knowing that jOOQ’s internals are extremely delicate and subject to change all the time.

For more details about this rationale, read also:

However, this solution has a severe drawback for those developing those internals. It’s a hell of a package with almost no structure. That makes development rather difficult.

What would be a better Java, then?

Java has always had an insufficient set of visibilities:

  • public
  • protected
  • default (package-private)
  • private

There should be a fifth visibility that behaves like public but prevents access from “outside” of a module. In a way, that’s between the existing public and default visibilities. Let’s call this the hypothetical module visibility.

In fact, not only should we be able to declare this visibility on a class or member, we should be able to govern module inter-dependencies on a top level, just like the Ceylon language allows us to do:

module org.hibernate "3.0.0.beta" {
    import ceylon.collection "1.0.0";
    import java.base "7";
    shared import java.jdbc "7";
}

This reads very similar to OSGi’s bundle system, where bundles can be imported / exported, although the above module syntax is much much simpler than configuring OSGi.

A sophisticated module system would go even further. Not only would it match OSGi’s features, it would also match those of Maven. With the possibility of declaring dependencies on a Java language module basis, we might no longer need the XML-based Maven descriptors, as those could be generated from a simple module syntax (or Gradle, or ant/ivy).

And with all of this in place, classes like sun.misc.Unsafe could be declared as module-visible for only a few JDK modules – not the whole world. I’m sure the number of people abusing reflection to get a hold of those internals would decrease by 50%.

Conclusion

I do hope that in a future Java, this Ceylon language feature (and also Fantom language feature, btw) will be incorporated into the Java language. A nice overview of Java 9 / Jigsaw’s modular encapsulation can be seen in this blog post:

The Features Project Jigsaw Brings To Java 9

Until then, if you’re an API designer, do know that all disclaimers won’t work. Your internal APIs will be used and abused by your clients. They’re part of your ordinary public API from day 1 after you publish them. It’s not your user’s fault. That’s how things work.