Divided we Stand: Optional

Our recent article “NULL is Not The Billion Dollar Mistake. A Counter-Rant” got us a lot of reads, controversial comments, and a 50/50 upvote / downvote ratio pretty much everywhere a blog post can be posted and voted on. This was expected.

Objectively, NULL is just a “special” value that has been implemented in a variety of languages and type systems, and in a variety of ways – including perhaps the set of natural numbers (a.k.a. “zero”, the original null – them Romans sure didn’t like that idea).

Or, as Charles Roth has put it adequately in the comments:

Chuckle. Occasionally a mathematics background comes in handy. Now we could argue about whether NULL was “invented” or “discovered”…

Now, Java’s null is a particularly obnoxious implementation of that “special value” for reasons like:

Compile-time typing vs. runtime typing

// We can assign null to any reference type
Object s = null;

// Yet, null is of no type at all
if (null instanceof Object)
    throw new Error("Will never happen");

The null literal is even more special

// Nothing can be put in this list, right?
List<?> list = new ArrayList<Void>();

// Yet, null can:
list.add(null);

Methods are present on the null literal

// This compiles, but does it run?
((Object) null).getClass();

Java 8’s Optional

The introduction of Optional might have changed everything. Many functional programmers love it so much because the type clearly communicates the cardinality of an attribute. In a way:

// Cardinality 1:
Type t1;

// Cardinality 0-1:
Optional<Type> t01;

// Cardinality 0..n:
Iterable<Type> tn;

A lot of Java 8’s Optional‘s interesting history has been dug out by Nicolai Parlog on his excellent blog.

Be sure to check it out:
http://blog.codefx.org/tag/optional

In the Java 8 expert groups, Optional wasn’t an easy decision:

[…] There has been a lot of discussion about [Optional] here and there over the years. I think they mainly amount to two technical problems, plus at least one style/usage issue:

  1. Some collections allow null elements, which means that you cannot unambiguously use null in its otherwise only reasonable sense of “there’s nothing there”.
  2. If/when some of these APIs are extended to primitives, there is no value to return in the case of nothing there. The alternative to Optional is to return boxed types, which some people would prefer not to do.
  3. Some people like the idea of using Optional to allow more fluent APIs.
    As in
    x = s.findFirst().or(valueIfEmpty)
    vs
    if ((x = s.findFirst()) == null) x = valueIfEmpty;
    Some people are happy to create an object for the sake of being able to do this. Although sometimes less happy when they realize that Optionalism then starts propagating through their designs, leading to Set<Optional<T>>’s and so on.

It’s hard to win here.

Doug Lea

Arguably, the main true reason for the JDK to have introduced Optional is the lack of availability of project valhalla’s specialization in Java 8, which meant that a performant primitive type stream (such as IntStream) needed some new type like OptionalInt to encode absent values as returned from IntStream.findAny(), for instance. For API consistency, such an OptionalInt from the IntStream type must be matched by a “similar” Optional from the Stream type.

Can Optional be introduced late in a platform?

While Doug’s concerns are certainly valid, there are some other, more significant arguments that make me wary of Optional (in Java). While Scala developers embrace their awesome Option type as they have no alternative and hardly ever see any null reference or NullPointerException – except when working with some Java libraries – this is not true for Java developers. We have our legacy collections API, which (ab-)uses null all over the place. Take java.util.Map, for instance. Map.get()‘s Javadoc reads:

Returns the value to which the specified key is mapped, or null if this map contains no mapping for the key.

[…]

If this map permits null values, then a return value of null does not necessarily indicate that the map contains no mapping for the key; it’s also possible that the map explicitly maps the key to null. The containsKey operation may be used to distinguish these two cases.

This is how much of the pre-Java 8 collection API worked, and we’re still using it actively with Java 8, with new APIs such as the Streams API, which makes extensive use of Optional.

A contrived (and obviously wrong) example:

Map<Integer, List<Integer>> map =
Stream.of(1, 1, 2, 3, 5, 8)
      .collect(Collectors.groupingBy(n -> n % 5));

IntStream.range(0, 5)
         .mapToObj(map::get)
         .map(List::size)
         .forEach(System.out::println);

Boom, NullPointerException. Can you spot it?

The map contains remainders of a modulo-5 operation as keys, and the associated, collected dividends as a value.

We then go through all numbers from 0 to 5 (the only possible remainders), extract the list of associated dividends, List::size them… wait. Oh. Map.get may return null.

You’re getting used to the fluent style of Java 8’s new APIs, you’re getting used to the functional and monadic programming style where streams and optional behave similarly, and you may be quickly surprised that anything passed to a Stream.map() method can be null.

In fact, if APIs were allowed to be retrofitted, then the Map.get method might look like this:

public interface Map<K,V> {
    Optional<V> get(Object key);
}

(it probably still wouldn’t because most maps allow for null values or even keys, which is hard to retrofit)

If we had such a retrofitting, the compiler would be complaining that we have to unwrap Optional before calling List::size. We’d fix it and write

IntStream.range(0, 5)
         .mapToObj(map::get)
         .map(l -> l.orElse(Collections.emptyList()))
         .map(List::size)
         .forEach(System.out::println);

Java’s Crux – Backwards compatibility

Backwards compatibility will lead to a mediocre adoption of Optional. Some parts of JDK API make use of it, others use null to encode the absent value. You can never be sure and you always have to remember both possibilities, because you cannot trust a non-Optional type to be truly “@NotNull“.

If you prefer using Optional over null in your business logic, that’s fine. But you will have to make very sure to apply this strategy thoroughly. Take the following blog post, for instance, which has gotten lots of upvotes on reddit:
Day 4 — Let’s write Null free Java code

It inadvertently introduces a new anti-pattern:

public class User {
 
    private final String username;
    private Optional<String> fullname;
 
    public User(String username) {
        this.username = username;
        this.fullname = Optional.empty();
    }
 
    public String getUsername() {
        return username;
    }
 
    public Optional<String> getFullname() {
        return fullname;
    }

    //      good--------^^^
    // vvvv--------bad
 
    public void setFullname(String fullname) {
        this.fullname = Optional.of(fullname);
    }
}

The domain object establishes an “Optional opt-in” contract, without opting out of null entirely. While getFullname() forces API consumers to reason about the possible absence of a full name, setFullname() doesn’t accept such an Optional argument type, but a nullable one. What was meant as a clever convenience will result only in confusion at the consumer site.

The anti-pattern is repeated by Steven Colebourne (who brought us Joda Time and JSR-310) on his blog, calling this a “pragmatic” approach:

public class Address {
    private final String addressLine;  // never null
    private final String city;         // never null
    private final String postcode;     // optional, thus may be null

    // constructor ensures non-null fields really are non-null
    // optional field can just be stored directly, as null means optional
    public Address(String addressLine, String city, String postcode) {
      this.addressLine = Preconditions.chckNotNull(addressLine);
      this.city = Preconditions.chckNotNull(city);
      this.postcode = postcode;
    }

    // normal getters
    public String getAddressLine() { return addressLine; }
    public String getCity() { return city; }

    // special getter for optional field
    public Optional getPostcode() {
      return Optional.ofNullable(postcode);
    }

    // return optional instead of null for business logic methods that may not find a result
    public static Optional<Address> findAddress(String userInput) {
      return ... // find the address, returning Optional.empty() if not found
    }
}

See the full article here:
http://blog.joda.org/2015/08/java-se-8-optional-pragmatic-approach.html

Choose your poison

We cannot change the JDK. JDK API are a mix of nullable and Optional. But we can change our own business logic. Think carefuly before introducing Optional, as this new optional type – unlike what its name suggests – is an all-or-nothing type. Remember that by introducing Optional into your code-base, you implicitly assume the following:

// Cardinality 1:
Type t1;

// Cardinality 0-1:
Optional<Type> t01;

// Cardinality 0..n:
Iterable<Type> tn;

From there on, your code-base should no longer use the simple non-Optional Type type for 0-1 cardinalities. Ever.

NULL is Not The Billion Dollar Mistake. A Counter-Rant

A short while ago, I gave this answer on Quora. The question was “What is the significance of NULL in SQL?” and most of the existing answers went on about citing C.J. Date or Tony Hoare and unanimously declared NULL as “evil”.

So, everyone rants about NULL all the time. Let me counter-rant.

Academics

Of course, academics like C.J. Date will rant about NULL (see Greg Kemnitz’s interesting answer on Quora). Let me remind you that C.J. Date also ranted about UNION ALL, as pure relational theory operates only on sets, not on bags (like SQL does). While in theory, sets are probably much purer than bags, in practice, bags are just very useful.

These people probably also still mourn over the fact that SQL (useful) won over QUEL (pure), and I don’t blame them. Theory is always more beautiful than the real world, which is exposed to real world requirements.

Purists

There are also other kinds of purists who will run about and educate everyone about their black/white opinions that leave no room to “it depends…” pragmatic approaches. I like to display this witty comic strip for such occasions: New intern knows best: GOTO. Purists like extreme abstraction when they describe their world, and such abstraction asks for very simple models, no complexity. NULL adds tremendous complexity to the SQL “model”, and does thus not fit their view.

Fact is: It depends

The only factual opinion ever is one where there’s no clear opinion. NULL is an incredibly useful value, and some representation of NULL is inevitable in all languages / models that want to model cardinalities of the form:

  • 0 or 1 (here’s where NULL is useful)
  • exactly 1 (here, you don’t need NULL)
  • 0 .. many (here, you don’t need NULL)

Functional programming languages like to make use of the Optional “monad” (see Mario Fusco’s excellent explanation of what a monad is) to model the 0 or 1 cardinality, but that’s just another way of modelling NULL. The (possibly) absent value. Perhaps, if you like to discuss style (then you should read this), NULL vs. Optional may matter to you, but they’re really exactly the same thing. We’ve just been shifting whitespace and curly braces.

The only way to really do without the absent value would be to disallow the optional cardinality and use 0 .. many instead, which would be much less descriptive.

So, regardless of what purists or academics say about a perfect world, we engineers need potent tools that help us get our work done, and NULL (or “Optional” is one of these potent tools that allow us to do so.

Caveat: SQL NULL is not an absent value

Now, the caveat with SQL’s NULL is that it doesn’t behave like an absent value. It is the UNKNOWN value as others have also explained. This subtle difference has severe impact on a variety of operations and predicates, which do not behave very intuitively if you’re not aware of this distinction. Some examples (and there are many many more):

Even with this specification of SQL NULL being UNKNOWN, most people abuse SQL NULL to model the absent value instead, which works just nicely in most cases until you run into a caveat. It turns out that the UNKNOWN value is even more useful than the absent value, as it allows for modelling things with even more descriptiveness. One might think that having two “special” values would solve problems, like JavaScript, which distinguishes between null (UNKNOWN) and undefined (absent).

JavaScript itself is a beacon of usefulness that is inversely proportional to its purity or beauty, so long story short:

Pick your favourite spot on the useful <-> pure scale

Programming, languages, data models are always a tradeoff between purity and usefulness. Pick your favourite spot on that scale, but stop ranting about NULL being evil. Or as Simon Peyton Jones said:

Haskell is useless

Java 8 Friday: Optional Will Remain an Option in Java

At Data Geekery, we love Java. And as we’re really into jOOQ’s fluent API and query DSL, we’re absolutely thrilled about what Java 8 will bring to our ecosystem.

Java 8 Friday

Every Friday, we’re showing you a couple of nice new tutorial-style Java 8 features, which take advantage of lambda expressions, extension methods, and other great stuff. You’ll find the source code on GitHub.

Optional: A new Option in Java

So far, we’ve been pretty thrilled with all the additions to Java 8. All in all, this is a revolution more than anything before. But there are also one or two sore spots. One of them is how Java will never really get rid of

Null: The billion dollar mistake tweet this

In a previous blog post, we have explained the merits of NULL handling in the Ceylon language, which has found one of the best solutions to tackle this issue – at least on the JVM which is doomed to support the null pointer forever. In Ceylon, nullability is a flag that can be added to every type by appending a question mark to the type name. An example:

void hello() {
    String? name = process.arguments.first;
    String greeting;
    if (exists name) {
        greeting = "Hello, ``name``!";
    }
    else {
        greeting = "Hello, World!";
    }
    print(greeting);
}

That’s pretty slick. Combined with flow-sensitive typing, you will never run into the dreaded NullPointerException again:

Recently in the Operating Room. By Geek and Poke

Recently in the Operating Room. By Geek and Poke

Other languages have introduced the Option type. Most prominently: Scala. Java 8 now also introduced the Optional type (as well as the OptionalInt, OptionalLong, OptionalDouble types – more about those later on)

How does Optional work?

The main point behind Optional is to wrap an Object and to provide convenience API to handle nullability in a fluent manner. This goes well with Java 8 lambda expressions, which allow for lazy execution of operations. An example:

Optional<String> stringOrNot = Optional.of("123");

// This String reference will never be null
String alwaysAString =
    stringOrNot.orElse("");

// This Integer reference will be wrapped again
Optional<Integer> integerOrNot = 
    stringOrNot.map(Integer::parseInt);

// This int reference will never be null
int alwaysAnInt = stringOrNot
        .map(s -> Integer.parseInt(s))
        .orElse(0);

There are certain merits to the above in fluent APIs, specifically in the new Java 8 Streams API, which makes extensive use of Optional. For example:

Arrays.asList(1, 2, 3)
      .stream()
      .findAny()
      .ifPresent(System.out::println);

The above piece of code will print any number from the Stream onto the console, but only if such a number exists.

Old API is not retrofitted

For obvious backwards-compatibility reasons, the “old API” is not retrofitted. In other words, unlike Scala, Java 8 doesn’t use Optional all over the JDK. In fact, the only place where Optional is used is in the Streams API. As you can see in the Javadoc, usage is very scarce:

http://docs.oracle.com/javase/8/docs/api/java/util/class-use/Optional.html

This makes Optional a bit difficult to use. We’ve already blogged about this topic before. Concretely, the absence of an Optional type in the API is no guarantee of non-nullability. This is particularly nasty if you convert Streams into collections and collections into streams.

The Java 8 Optional type is treacherous tweet this

Parametric polymorphism

The worst implication of Optional on its “infected” API is parametric polymorphism, or simply: generics. When you reason about types, you will quickly understand that:

// This is a reference to a simple type:
Number s;

// This is a reference to a collection of
// the above simple type:
Collection<Number> c;

Generics are often used for what is generally accepted as composition. We have a Collection of String. With Optional, this compositional semantics is slightly abused (both in Scala and Java) to “wrap” a potentially nullable value. We now have:

// This is a reference to a nullable simple type:
Optional<Number> s;

// This is a reference to a collection of 
// possibly nullable simple types
Collection<Optional<Number>> c;

So far so good. We can substitute types to get the following:

// This is a reference to a simple type:
T s;

// This is a reference to a collection of
// the above simple type:
Collection<T> c;

But now enter wildcards and use-site variance. We can write

// No variance can be applied to simple types:
T s;

// Variance can be applied to collections of
// simple types:
Collection<? extends T> source;
Collection<? super T> target;

What do the above types mean in the context of Optional? Intuitively, we would like this to be about things like Optional<? extends Number> or Optional<? super Number>. In the above example we can write:

// Read a T-value from the source
T s = source.iterator().next();

// ... and put it into the target
target.add(s);

But this doesn’t work any longer with Optional

Collection<Optional<? extends T>> source;
Collection<Optional<? super T>> target;

// Read a value from the source
Optional<? extends T> s = source.iterator().next();

// ... cannot put it into the target
target.add(s); // Nope

… and there is no other way to reason about use-site variance when we have Optional and subtly more complex API.

If you add generic type erasure to the discussion, things get even worse. We no longer erase the component type of the above Collection, we also erase the type of virtually any reference. From a runtime / reflection perspective, this is almost like using Object all over the place!

Generic type systems are incredibly complex even for simple use-cases. Optional makes things only worse. It is quite hard to blend Optional with traditional collections API or other APIs. Compared to the ease of use of Ceylon’s flow-sensitive typing, or even Groovy’s elvis operator, Optional is like a sledge-hammer in your face.

Be careful when you apply it to your API!

Primitive types

One of the main reasons why Optional is still a very useful addition is the fact that the “object-stream” and the “primitive streams” have a “unified API” by the fact that we also have OptionalInt, OptionalLong, OptionalDouble types.

In other words, if you’re operating on primitive types, you can just switch the stream construction and reuse the rest of your stream API usage source code, in almost the same way. Compare these two chains:

// Stream and Optional
Optional<Integer> anyInteger = 
Arrays.asList(1, 2, 3)
      .stream()
      .filter(i -> i % 2 == 0)
      .findAny();
anyInteger.ifPresent(System.out::println);

// IntStream and OptionalInt
OptionalInt anyInt =
Arrays.stream(new int[] {1, 2, 3})
      .filter(i -> i % 2 == 0)
      .findAny();
anyInt.ifPresent(System.out::println);

In other words, given the scarce usage of these new types in JDK API, the dubious usefulness of such a type in general (if retrofitted into a very backwards-compatible environment) and the implications generics erasure have on Optional we dare say that

The only reason why this type was really added is to provide a more unified Streams API for both reference and primitive types tweet this

That’s tough. And makes us wonder, if we should finally get rid of primitive types altogether.

Oh, and…

Optional isn’t Serializable.

Nope. Not Serializable. Unlike ArrayList, for instance. For the usual reason:

Making something in the JDK serializable makes a dramatic increase in our maintenance costs, because it means that the representation is frozen for all time. This constrains our ability to evolve implementations in the future, and the number of cases where we are unable to easily fix a bug or provide an enhancement, which would otherwise be simple, is enormous. So, while it may look like a simple matter of “implements Serializable” to you, it is more than that. The amount of effort consumed by working around an earlier choice to make something serializable is staggering.

Citing Brian Goetz, from:

http://mail.openjdk.java.net/pipermail/jdk8-dev/2013-September/003276.html

Want to discuss Optional? Read these threads on reddit:

Stay tuned for more exciting Java 8 stuff published in this blog series.

More on Java 8

In the mean time, have a look at Eugen Paraschiv’s awesome Java 8 resources page

What if every object was an array? No more NullPointerExceptions!

To NULL or not to NULL? Programming language designers inevitably have to decide whether they support NULLs or not. And they’ve proven to have a hard time getting this right. NULL is not intuitive in any language, because NULL is an axiom of that language, not a rule that can be derived from lower-level axioms. Take Java for instance, where

// This yields true:
null == null

// These throw an exception (or cannot be compiled)
null.toString();
int value = (Integer) null;

It’s not like there weren’t any alternatives. SQL, for instance, implements a more expressive but probably less intuitive three-value logic, which most developers get wrong in subtle ways once in a while.

At the same time, SQL doesn’t know “NULL” results, only “NULL” column values. From a set theory perspective, there are only empty sets, not NULL sets.

Other languages allow for dereferencing null through special operators, letting the compiler generate tedious null checks for you, behind the scenes. An example for this is Groovy with its null-safe dereferencing operator. This solution is far from being generally accepted, as can be seen in this discussion about a Scala equivalent. Scala uses Option, which Java 8 will imitate using Optional (or @Nullable).

Let’s think about a much broader solution

To me, nullability isn’t a first-class citizen. I personally dislike the fact that Scala’s Option[T] type pollutes my type system by introducing a generic wrapper type (even if it seems to implement similar array-features through the traversable trait). I don’t want to distinguish the types of Option[T] and T. This is specifically true when reasoning about types from a reflection API perspective, where Scala’s (and Java’s) legacy will forever keep me from accessing the type of T at runtime.

But much worse, most of the times, in my application I don’t really want to distinguish between “option” references and “some” references. Heck, I don’t even want to distinguish between having 1 reference and having dozens. jQuery got this quite right. One of the main reasons why jQuery is so popular is because everything you do, you do on a set of wrapped DOM elements. The API never distinguishes between matching 1 or 100 div’s. Check out the following code:

// This clearly operates on a single object or none
$('div#unique-id').html('new content')
                  .click(function() { ... });

// This possibly operates on several objects or none
$('div.any-class').html('new content')
                  .click(function() { ... });

This is possible because JavaScript allows you to override the prototype of the JavaScript Array type, modifying arrays in general, at least for the scope of the jQuery library. How more awesome can it get? .html() and .click() are actions performed on the array as a whole, no matter if you have zero, one, or 100 elements in your match. What would a more typesafe language look like, where everything behaves like an array (or an ArrayList)? Think about the following model:

class Customer {
  String firstNames;  // Read as String[] firstNames
  String lastName;    // Read as String[] lastName
  Order orders;       // Read as Order[] orders
}

class Order {
  int value;          // Read as int[] value
  boolean shipped() { // Read as boolean[] shipped
  }
}

Don’t rant (just yet). Let’s assume this wouldn’t lead to memory or computation overhead. Let’s continue thinking about the advantages of this. So, I want to see if a Customer’s orders have been shipped. Easy:

Customer customer = // ...
boolean shipped = customer.orders.shipped();

This doesn’t look spectacular (yet). But beware of the fact that a customer can have several orders, and the above check is really to see if all orders have been shipped. I really don’t want to write the loop, I find it quite obvious that I want to perform the shipped() check on every order. Consider:

// The length pseudo-field would still be
// present on orders
customer.orders.length;

// In fact, the length pseudo-field is also
// present on customer, in case there are several
customer.length;

// Let's add an order to the customer:
customer.orders.add(new Order());

// Let's reset order
customer.orders.clear();

// Let's calculate the sum of all values
// OO-style:
customer.orders.value.sum();
// Functional style:
sum(customer.orders.value);

Of course there would be a couple of caveats and the above choice of method names might not be the best one. But being able to deal with single references (nullable or non-nullable) or array references (empty, single-valued, multi-valued) in the same syntactic way is just pure syntax awesomeness. Null-checks would be replaced by length checks, but mostly you don’t even have to do those, because each method would always be called on every element in the array. The current single-reference vs. multi-reference semantics would be documented by naming conventions. Clearly, naming something “orders” indicates that multi-references are possible, whereas naming something “customer” indicates that multi-references are improbable.

As users have commented, this technique is commonly referred to as array programming, which is implemented in Matlab or R.

Convinced?

I’m curious to hear your thoughts!

On Java 8’s introduction of Optional

I had recently discovered the JDK 8’s addition of the Optional type. The Optional type is a way to avoid NullPointerException, as API consumers that get Optional return values from methods are “forced” to perform “presence” checks in order to consume their actual return value. More details can be seen in the Javadoc.

A very interesting further read can be seen here in this blog post, which compares the general notion of null and how null is handled in Java, SML, and Ceylon:
Java 8 Optional Objects

“blank” and “initial” states were already known to Turing . One could also argue that the “neutral” or “zero” state was required in the Babbage Engine, which dates back to Ada of Lovelace in the 1800′s.

On the other hand, mathematicians also prefer to distinguish “nothing” from “the empty set”, which is “a set with nothing inside”. This compares well with “NONE” and “SOME”, as illustrated by the aforementioned Informatech blog post, and as implemented by Scala, for instance.

Anyway, I’ve given Java’s Optional some thought. I’m really not sure if I’m going to like it, even if Java 9 would eventually add some syntactic sugar to the JLS, which would resemble that of Ceylon to leverage Optional on a language level. Since Java is so incredibly backwards-compatible, none of the existing APIs will be retrofitted to return Optional, e.g, the following isn’t going to surface the JDK 8:

public interface List<E> {
    Optional<E> get(int index);
    [...]
}

Not only can we assign null to an Optional variable, but the absence of “Optional” doesn’t guarantee the semantics of “SOME”, as lists will still return “naked” null values. When we mix the two ways of thinking, we will wind up with two checks, instead of one

Optional<T> optional = // [...]
T nonOptional = list.get(index);

// If we're paranoid, we'll double-check!
if (optional != null && optional.isPresent()) {
    // do stuff
}

// Here we probably can't trust the value
if (nonOptional != null) {
    // do stuff
}

Hence…

-1 from me to Java’s solution

Further reading

Of course, this has been discussed millions of times before. So here are a couple of links: