NULL is Not The Billion Dollar Mistake. A Counter-Rant

A short while ago, I gave this answer on Quora. The question was “What is the significance of NULL in SQL?” and most of the existing answers went on about citing C.J. Date or Tony Hoare and unanimously declared NULL as “evil”.

So, everyone rants about NULL all the time. Let me counter-rant.

Academics

Of course, academics like C.J. Date will rant about NULL (see Greg Kemnitz’s interesting answer on Quora). Let me remind you that C.J. Date also ranted about UNION ALL, as pure relational theory operates only on sets, not on bags (like SQL does). While in theory, sets are probably much purer than bags, in practice, bags are just very useful.

These people probably also still mourn over the fact that SQL (useful) won over QUEL (pure), and I don’t blame them. Theory is always more beautiful than the real world, which is exposed to real world requirements.

Purists

There are also other kinds of purists who will run about and educate everyone about their black/white opinions that leave no room to “it depends…” pragmatic approaches. I like to display this witty comic strip for such occasions: New intern knows best: GOTO. Purists like extreme abstraction when they describe their world, and such abstraction asks for very simple models, no complexity. NULL adds tremendous complexity to the SQL “model”, and does thus not fit their view.

Fact is: It depends

The only factual opinion ever is one where there’s no clear opinion. NULL is an incredibly useful value, and some representation of NULL is inevitable in all languages / models that want to model cardinalities of the form:

  • 0 or 1 (here’s where NULL is useful)
  • exactly 1 (here, you don’t need NULL)
  • 0 .. many (here, you don’t need NULL)

Functional programming languages like to make use of the Optional “monad” (see Mario Fusco’s excellent explanation of what a monad is) to model the 0 or 1 cardinality, but that’s just another way of modelling NULL. The (possibly) absent value. Perhaps, if you like to discuss style (then you should read this), NULL vs. Optional may matter to you, but they’re really exactly the same thing. We’ve just been shifting whitespace and curly braces.

The only way to really do without the absent value would be to disallow the optional cardinality and use 0 .. many instead, which would be much less descriptive.

So, regardless of what purists or academics say about a perfect world, we engineers need potent tools that help us get our work done, and NULL (or “Optional” is one of these potent tools that allow us to do so.

Caveat: SQL NULL is not an absent value

Now, the caveat with SQL’s NULL is that it doesn’t behave like an absent value. It is the UNKNOWN value as others have also explained. This subtle difference has severe impact on a variety of operations and predicates, which do not behave very intuitively if you’re not aware of this distinction. Some examples (and there are many many more):

Even with this specification of SQL NULL being UNKNOWN, most people abuse SQL NULL to model the absent value instead, which works just nicely in most cases until you run into a caveat. It turns out that the UNKNOWN value is even more useful than the absent value, as it allows for modelling things with even more descriptiveness. One might think that having two “special” values would solve problems, like JavaScript, which distinguishes between null (UNKNOWN) and undefined (absent).

JavaScript itself is a beacon of usefulness that is inversely proportional to its purity or beauty, so long story short:

Pick your favourite spot on the useful <-> pure scale

Programming, languages, data models are always a tradeoff between purity and usefulness. Pick your favourite spot on that scale, but stop ranting about NULL being evil. Or as Simon Peyton Jones said:

Haskell is useless

35 thoughts on “NULL is Not The Billion Dollar Mistake. A Counter-Rant

  1. “but that’s just another way of modelling NULL.” except not: you can have nested Option not nested null and you can have Option.some(null) (even if it is not recommended design).

    1. nested Option

      Curious, what’s the (real world) use-case of that?

      and you can have Option.some(null)

      Of course you can do that. Because the “Option” solution to implementing 0..1 cardinalities works on top of the “null pointer” solution – at least on the JVM. Yet, the problem they’re solving is the same in both cases. 0..1 cardinalities.

      1. Real-world ues case: I have a cache (a map) of username -> real name.

        Except some user have no real name (technical users).

        So when when I try to get an element from my map by username, the key may or may not be there, and if the key is there, the value may be a real name or may be nothing.
        Here you have nested option. If I were using null instead of Option.none I could not distinguish from username that are not in my map from username that have an empty RealName, without an additional, error-prone, call to contains().

        1. Why is the user who has no real name even in that map? As a SQL person favouring normalisation, I don’t see the value of that. :)

          1. Because we need to track the fact that this user has no realname, if not we would do a remote call each time for username that have no realname. We could but them in a seperate set but this add complexity imo.

            1. So, you make a compromise and give up on purity? Isn’t that what the pragmatic null (or None, or “absent”, etc.) is all about? In the end, your solution isn’t that much different from Java’s. Apart from some compile-time hint that helps you recognise the possibility of absent values from types only (which you can, in Java as well, as every reference can be “absent”)

          2. You’re contradicting yourself now – normalization is on the “purity” end of your scale. Real world usually requires data to be in different stages of denormalization.

  2. I don’t really see a problem here. The various forms of NULL make a lot of sense in SQL. Not so much in, say, Java, where null may have been the billion dollar mistake. Many people have imagined lots of alternatives, although none of them are without sharp edges.

    In many cases, especially where there are a lot of subclasses, I’ll have a “Zen” class that gets used like null, but has all the desired methods… all of which return false, empty strings, empty lists, whatever. It’s a true object that represents non-existence (which is a pretty Zen concept when you get down to it).

    1. Your Zen class is one more way of handling the idea of a possibly absent value. You’re abusing subtype polymorphism for the same effect. Might be clever in some situations, disadvantageous in others. Fact is, you’re still emulating an absent value with a concrete, “present” value – just like Java does with its quirky null pointer.

      1. Agree with the 1st part. But if you read thru the history of mathematics, you’ll find that many people made the same argument about the notion of the number zero! :-) I mean, really, how can you possibly represent “nothing” with an actual number?! (humor mode=on)

        Or think of physics. “Empty” vacuum, right? Well, no. ‘absence’ does have properties!

        In the end, it’s ALL symbols. The map is not the territory. NULL can be a concrete thing. Heck, I’d like to see a Java “null” object that has ALL methods of ALL objects, and returns nothing (0, false, empty list, etc.) for all of them. That would be an … interesting… approach.

        1. you’ll find that many people made the same argument about the notion of the number zero

          Holy smokes, that was on the TODO list of my blog! You guessed it? :) I was going to compare NULL with 0 and with {} or ∅ in the hope that they would finally stop arguing about concrete representations…

          And me, too, I’d like to see that all-knowing “null” object in Java. The ubiquitous void. We wouldn’t need silly monads and flatmapping or elvis operators to call one method after the other…

          1. Chuckle. Occasionally a mathematics background comes in handy. Now we could argue about whether NULL was “invented” or “discovered”…

  3. I guess, in programming languages, null is a problem because of the NPE. In SQL, it’s a problem because of people being unaware of all the related quirks. It may be practical to silently convert NULL to FALSE, it may be practical to evaluate ('A', NULL) IS NULL to TRUE, it may be practical to omit NULL from the index by default. This all gets done and may have its advantages, but it lacks a good logic and it’s very error-prone. So I’m not surprised when people get surprised by this surprising behavior.

  4. Lukas, I think noone can argue that having a notion for the concept of ‘absence of any value’ is harmful. At least not with a straight face.

    The debate about null being bad is not about the existence of it, but its omni-presence. The ‘billion dollar mistake’ is that every variable of type T is *forced* to also admit null, whether or not that makes sense. It’s a backdoor into the type-system that is installed *everywhere*, even where it is not needed, and that complicates things.

    This is also where Optional is different. No value x that can be assigned to a variable of type Optional can ever legally be assigned to a variable of type T. *The type system of any decent strongly-typed language will prevent it*. This is a killer-feature, but it’s precicely what you give up when introducing null as a legal assignment for every variable of type T.

    In conclusion, “it depends…” is exactly what the whole argument is all about: *It depends* on the context, whether null should be allowed for a type T, but Java does not give me that Option.

    1. For some reason, the type parameters of Optional got dropped. Sorry for that.

    2. So, essentially, Optional is a more formal, yet more heavy-weight approach than using some naming convention, right? You’re trading one workaround for the other… I know that Optional is currently treated as the FP holy grail for possibly absent values, but it is “wrong” (all monads are) in a new way. We will face utter over-engineering of our applications by monadising every possible state representation into extremely unreadable types nesting several levels of generics. Compared to this, null may be quirky, but it’s “simple” and useful.

      Obviously, I don’t disagree with you, but I don’t see null as the billion dollar mistake. It’s more of an “oh well” situation, which could be remedied, for instance, by not throwing the exception but by ignoring the method call instead.

      Btw: I’m curious about the implications of Java’s null on project valhalla and on value types!

      1. So, essentially, Optional is a more formal, yet more heavy-weight approach than using some naming convention, right?

        Exactly, and the additional formalism enables type-system checks. I don’t consider compile-time guarantees by the type system a workaround.

        I agree that ‘Optional ‘ is ugly and bulky, especially in Java (at least up to 7) where you need to even specify the type again when passing an Optional. absent() literal into a methond that takes an Optional . (Did I mention it’s a Long? ;-) However, I have seen type systems where T is non-nullable and T? is nullable. It’s amazing how much more expressive the code looks with those types and how much more compiler inference is possible with a simple additional question mark here and there.

        Having that said, I’ll now return to my code, which is waay on the ‘useful’ end of the useful pure scale ;-)

        1. I’m intrigued myself about the T vs. T? distinction (iff they also add support for T*). I really don’t think that generics are a good means to model cardinalities. That’s why I think it’s a workaround. From a language (not type system!) perspective, I want to be able to assign:

          T t = new T();
          T? t1 = t;
          

          Enjoy your useful code! :)

          1. Yay! I’m late to the party! I was going to write Simon’s exact post, but Simon already wrote it for me, so that’s good. I’ll continue here in the thread, then.

            @lukaseder, while calling null a billion dollar mistake may be overstating things, I feel that you are drastically understating the damage it has caused. Frankly, it is downright offensive that the language refuses to extend me the option to choose what valid values for my type are. Null is not a minor annoyance; it is the difference between programs working and not working. A stacktrace my tell you where the value is null, but it doesn’t tell you how it became null and what other code is responsible. I would ask why the primitives (int, float, etc.) were so privileged with the power of avoiding null, but I already know the answer: this is considered a _mistake_ by people, and they feel that all types should allow null!

            Thankfully, value types are coming to Java to bring a little sanity to things, but it is really too little too late. As you’ve pointed out, null is already an integral part of the language and standard APIs.

            Why would you require support for T*? I assume that refers to an unsafe reference (one you can use without checking null first). How does that make sense in a language with checked exceptions? The programmer absolutely should be forced to check for a valid value before using it.

            1. A stacktrace my tell you where the value is null, but it doesn’t tell you how it became null and what other code is responsible

              To be fair, this argument also applies to Optional. You don’t know how an Optional became empty() and what other code is responsible.

              I like Maaartinus’s comment on a different blog post than this one here:
              https://blog.jooq.org/2015/08/20/divided-we-stand-optional/#comment-133780

              Specifically, I like the link he pointed to:
              https://groups.google.com/d/msg/project-lombok/ROx9rGRb6lI/EF0lk8F0N10J

              The main reason people run into NPEs is because some developer doesn’t care (enough) about the possibility of a reference being null. This can happen with Optional just the same, but instead of having a fail-early strategy (the dreaded NPE), we’ll just not do anything at all.

              So it might be reasonable to say that from this perspective, perhaps, not much will change.

              it is downright offensive that the language refuses to extend me the option to choose what valid values for my type are

              That, of course, is a different story. It’s true that adding one more value to each type in a type system is weird. E.g. the “three-valued” java.lang.Boolean (I must admit, I have appreciated this third value numerous times – when others would have probably recommended to use an enum with three literals).

              Why would you require support for T*?

              To be consistent in the cardinality schema, you should not only distinguish between “1” and “0-1” cardinalities, but offer also “0-n”. In other words (as in Ceylon):

              // "1" cardinality
              T value1 = new T();
              
              // "0-1" cardinality
              T? value2 = null;
              
              // "0-n" cardinality
              T* values = { new T(), new T() };
              

              Not having this convenient way of specifying an array makes the T? type a bit incomplete to me.

  5. The best part of generalizing the use of Option (and more generally sum types) in your project is never checking for null ever again: readability improved, lots of questions resolved by the compiler: is this parameter mandatory ? can this method return null?

    1. I may be spoiled by the fact that I maintain an API using which I generally never ask myself this question either. I do admit missing sum types, though! Would save me lots of trouble when designing overloaded API.

  6. A value of type Maybe String in Haskell is not the same thing a nullable string because it is distinguished from a proper String (it has a distinct type). This means you cannot pass a Maybe String to a function expecting a String (type error). More importantly, nullablity is opt-in rather than available
    everywhere, so you can be certain that null cases cannot occur in proper strings.

    Regarding SPJ’s interview: you do realize that it was an hyperbole, right? Since version 1.4 (around 20 years) has had monadic I/O. If anything Haskell shows that you can have your purity and “eat the cake” of effects as well!

    1. Strictly speaking, a “null String” is not a String either, but a distinct type. Proof? (null instanceof String) == false. ;)

      I understand hyperboles, I write them myself. Moreover, Tony Hoare made one, when he claimed a billion dollar mistake for what is merely a minor annoyance.

      Also, I do understand all the advantages of wrapping absent values in an option monad. But I also understand the disadvantages in terms of type-verbosity. It’s a trade-off, which sometimes makes sense to make, sometimes doesn’t.

          1. Class != Type:

            Types and type systems are a compile-time concepts. At runtime you may merely reflect on some implementation details of the type system that still persist at runtime (and not all things persist, ie type erasure).

            And type erasure is a good thing btw, it should have been named “not bringing unnecessary shit at runtime”

            1. OK OK. I’m gonna go ahead and say: You’re right and I’m very glad we had this conversation!

              And type erasure is a good thing btw, it should have been named “not bringing unnecessary shit at runtime”

              You mean like Valhalla’s specialised templates? :)

  7. You misunderstood the original quote. Null pointers were the only things being referred to in that quote. Null as a value for databases and/or scalar data types was not the point.

    1. That’s true, yet the quote is used by people to argue against any implementation of null. After all, null pointers are also used as null values / default values / uninitialised values / absent values, etc.

  8. I am lean to call it a mistake. If so many people are not comfortable with the design, then we can at least say it is not well designed.

    Here is an arguable example. In SQL syntax, the statement ” columnA = null …” is illegal, but it is acceptable for all the developer in the world in other programming languages.

    Another interesting story, if columnA (type integer) is nullable, you got such a result,
    if columnA > 3 — No,
    if columnA < 3 — No
    if columnA = 3 — No
    I mentored many juniors, most of them made that mistake even being told null is not comparable.

    1. In SQL syntax, the statement ” columnA = null …” is illegal

      It’s not illegal, au contraire. It just always evaluates to NULL (because of three-valued logic).

      I think the problem on the SQL side is the fact that the biased term “NULL” was recycled. If we were using the term “UNKNOWN” instead (which is really what NULL means in SQL, and how it is defined in the SQL standard), then it would be less confusing. Translating this to your ifs:

      • if UNKNOWN > 3 — UNKNOWN
      • if UNKNOWN < 3 — UNKNOWN
      • if UNKNOWN = 3 — UNKNOWN

      And, of course

      • if UNKNOWN IS UNKNOWN — TRUE
      • if UNKNOWN IS NOT UNKNOWN — FALSE

Leave a Reply to lukasederCancel reply