Do You Really Have to Name Everything in Software?

This is one of software engineering’s oldest battles. No, I’m not talking about where to put curly braces, or whether to use tabs or spaces. I mean the eternal battle between nominal typing and structural typing. This article is inspired by a very vocal blogger who eloquently reminds us to …
[…] Please Avoid Functional Vomit
Read the full article here: https://dzone.com/articles/using-java-8-please-avoid-functional-vomit

What’s the post really about?

It is about naming things. As we all know:
There are only two hard things in Computer Science: cache invalidation and naming things. — Phil Karlton
Now, for some reason, there is a group of people who wants constant pain and suffering by explicitly naming everything, including rather abstract concepts and algorithmic components, such as compound predicates. Those people like nominal typing and all the features that are derived from it. What is nominal typing (as opposed to structural typing)? Structural typing SQL is a good example to study the two worlds. When you write SQL statements, you’re creating structural row types all the time. For instance, when you write:

SELECT first_name, last_name
FROM customer

… what you’re really doing is you’re creating a new rowtype of the structure (in pseudo-SQL):
TYPE (
  first_name VARCHAR,
  last_name VARCHAR
)
The type has the following properties:
  • It is a tuple or record (as always in SQL)
  • It contains two attributes or columns
  • Those two attributes / columns are called first_name and last_name
  • Their types is VARCHAR
This is a structural type, because the SQL statement that produces the type only declares the type’s structure implicitly, by producing a set of column expressions. In Java, we know lambda expressions, which are (incomplete) structural types, such as:

// A type that can check for i to be even
i -> i % 2 == 0

Nominal typing Nominal typing takes things one step further. In SQL, nominal typing is perfectly possible as well, for instance, in the above statement, we selected from a well-known table by name customer. Nominal typing assigns a name to a structural type (and possibly stores the type somewhere, for reuse). If we want to name our (first_name, last_name) type, we could do things like:

-- By using a derived table:
SELECT *
FROM (
  SELECT first_name, last_name
  FROM customer
) AS people

-- By using a common table expression:
WITH people AS (
  SELECT first_name, last_name
  FROM customer
)
SELECT *
FROM people

-- By using a view
CREATE VIEW people AS
SELECT first_name, last_name
FROM customer

In all cases, we’ve assigned the name people to the structural type (first_name, last_name). The only difference being the scope for which the name (and the corresponding content) is defined. In Java, we can only use lambda expressions, once we assign them to a typed name, either by using an assignment, or by passing the expression to a method that takes a named type argument:

// Naming the lambda expression itself
Predicate<Integer> p = i -> i % 2 == 0

// Passing the lambda expression to a method
Stream.of(1, 2, 3)
      .filter(i -> i % 2 == 0);

Back to the article

The article claims that giving a name to things is always better. For instance, the author proposes giving a name to what we would commonly refer to as a “predicate”:

//original, less clear code
if(barrier.value() > LIMIT && barrier.value() > 0){
//extracted out to helper function. More code, more clear
if(barrierHasPositiveLimitBreach()){

So, the author thinks that extracting a rather trivial predicate into an external function is better because a future reader of such code will better understand what’s going on. At least in the article’s opinion. Let’s refute this claim for the sake of the argument:
  • The proposed name is verbose and requires quite some thinking.
  • What does breach mean?
  • Is breach the same as >= or the same as >?
  • Is LIMIT a constant? From where?
  • Where is barrier? Who owns it?
  • What does the verb “has” mean, here? Does it depend on something outside of barrier? E.g. some shared state?
  • What happens if there’s a negative limit?
By naming the predicate (remember, naming things is hard), the OP has added several layers of cognitive complexity to the reader, while quite possibly introducing subtle bugs, because probably both LIMIT and barrier should be function arguments, rather than global (im)mutable state that is assumed to be there, by the function. The name introduced several concepts (“to have a breach”, “positive limit”, “breach”) that are not well defined and need some deciphering. How do we decipher it? Probably by looking inside the function and reading the actual code. So what do we gain? Better reuse, perhaps? But is this really reusable? Finally, there is a (very slight) risk of introducing a performance penalty by the additional indirection. If we translate this to SQL, we could have written a stored function and then queried:

SELECT *
FROM orders -- Just an assumption here
WHERE barrier_has_positive_limit_breach(orders.barrier)

If this was some really complicated business logic depending on a huge number of things, perhaps extracting the function might’ve been worthwile. But in this particular case, is it really better than:

SELECT *
FROM orders
WHERE barrier > :limit AND barrier > 0

or even

SELECT *
FROM orders
WHERE barrier > GREATEST(:limit, 0)

Conclusion

There are some people in our industry who constantly want to see the world in black and white. As soon as they’ve had one small success story (e.g. reusing a very common predicate 4-5 times by extracting it into a function), they conclude with a general rule of this approach being always superior. They struggle with the notion of “it depends”. Nominal typing and structural typing are both very interesting concepts. Structural typing is extremely powerful, whereas nominal typing helps us humans keep track of complexity. In SQL, we’ve always liked to structure our huge SQL statements, e.g. in nameable views. Likewise, Java programmers structure their code in nameable classes and methods. But it should be immediately clear to anyone reading the linked article that the author seems to like hyperboles and probably wasn’t really serious, given the silly example he came up with. The message he’s conveying is wrong, because it claims that naming things is always better. It’s not true. Be pragmatic. Name things where it really helps. Don’t name things where it doesn’t. Or as Leon Bambrick amended Phil Karlton’s quote:
There are only two hard things in Computer Science: cache invalidation, naming things, and off-by-one errors
Here’s my advice to you, dear nominal typing loving blogger. There’s are only two ways of typing: nominal typing and structural typing. And it depends typing.

10 thoughts on “Do You Really Have to Name Everything in Software?

  1. Couldn’t agree more. Moderation in all things. Mathematics has come a long way without putting mind-deadening CamelCaseVerbosity everywhere. Use names for cross-reference purposes or the application of the “don’t repeat yourself” principle, otherwise, contain yourself. Unfortunately, the programming language may be grody enough to not let you (that’s you, Java!)

    The desire to go all out on verbose names seem to be related to the idea of “self-documenting” code, whereas we have code that supposedly does not need to be documented because the reviewer can immediately find out how it works by looking at the “meaningful names”. It’s a bad and impractical idea that must have come from people just leaving university.

    Anyway, what could be clearer than this quicksort in Haskell (from http://rosettacode.org/wiki/Quicksort#Haskell)

    qsort [] = []
    qsort (x:xs) = qsort [y | y <- xs, y < x] ++ [x] ++ qsort [y | y = x]

    One quibble:

    “In Java, we know lambda expressions, which are (incomplete) structural types, such as:”

    Actually, I would say they are “arrow types” or functions (i.e. a way to describe how an input value should be transformed into an output value) instead of structural types (which are cartesian products or similar of other types).

  2. Quite so. However, the real problem with the original – admittedly silly – example is, in my view, that the name for the predicate is an extremely poor choice. The choice is bad because it just tries to repeat implementation detail. That leads to most of the drawbacks that you mention. And more: what happens to your function name when the logic changes? But perhaps (it depends, of course) a name could have been chosen that would really have made the intention behind the code clearer. Something like, say, “if( orderWeightExceedsTransportCapacity() ) {…}” which for all its verbosity would at least tell a story, and indicate a possible locus of future change. Such a name also introduces new concepts and “layers of cognitive complexity”. But that is not a valid objection per se, when the concepts have explanatory power.

    I also suggest to Yatima that the Haskell code is so obvious in part because everyone knows that “qsort” can only be a Quicksort implementation. When you already know what those xs and ys ought to be doing, of course you don’t need a name to help you figure it out.

    Finally, I think it misleading to say that lambdas are structural types, incomplete or not. Lambda
    expressions are instances of functional interfaces, which makes them nominal function types pure and simple.

    1. All you’re essentially saying is you’re confirming that naming things is hard. And because time is money, we can postpone the decision to name things, and just let er go and do something else. Right? RIGHT? :)

      Lambda expressions are instances of functional interfaces

      Nope. The expression is nothing real / tangible (the expression). Only when you have a target type, it becomes that type’s implementation. I like Yatima’s correction saying that a lambda expression is … well … a function.

      1. Right. I completely agree with doing more useful things. I only wanted to point out that the example you were arguing against was something of a straw man, because it would be considered bad even from the viewpoint of a person (like me) who tends towards naming many things that others find unspeakable…

          1. That depends, of course. It’s really mostly a matter of taste. By and large, anything that’s an understandable abstraction (makes sense on its own) should have a name. That simplifies thinking about its functional role, and its relation to other entities.

  3. Both structural and nominal typing has pros and cons. As far as lambdas Java now has some loose structural typing due to lambda support but is mainly nominal. It is akin to Go’s languages interface. I’m actually not sure what you call that kind of polymorphism (ie nominal declarations where what is passed it doesn’t need to be associated). Some call it duck typing or protocol typing. Regardless I wouldn’t Java lambda support pure nominal typing either.

    1. No, I still take exception to speaking of “structural typing” with regard to lambda expressions. Of course I know what you mean, and it is certainly helpful to think of lambdas as functions. But as Lukas said, without a target type they cannot be type-checked. Therefore they essentially have no type in Java. Structural types are quite another thing, and you will only muddy the waters by saying that the typing of lambdas in Java has anything to do with that, because it hasn’t. The typing of lambda expressions in Java is purely nominal.

      As authority for this I cite Brian Goetz, who stated the following on Sep. 17 2013 on lambda-dev: “While you may reasonably regret the lack of structural function types, the next best thing is to really stick to the decision to embrace nominal function types”.

      As an example, I think if lambdas were structurally typed, you wouldn’t have to do silly things like Predicate p = f::apply, where f is of type Function and more.

      1. What happens underneath and what is apparent to the developer are two separate things. If it were truly nominal typing the developer would be required to implement the functional interface. I understand Java will coerce the expression to be that interface and that the expression doesn’t really exist but to the developer they did not have to reference something named. This discussion is in large part based on what your definition is of nominal typing which in some ways is just like ” what is strongly typed”… it can be fairly nebulous.

        As I said earlier: “As far as lambdas Java now has some loose structural typing due to lambda support but is mainly nominal”. All I’m saying is IMO it isn’t complete nominal typing from the perspective of the developer. If anything it is compiler duck typing (albeit I really hate using that phrase) or super duper syntactic sugar.

Leave a Reply to Sebastian Millies Cancel reply