This is one of software engineering’s oldest battles. No, I’m not talking about
where to put curly braces, or whether to use tabs or spaces. I mean the eternal battle between nominal typing and structural typing.
This article is inspired by a very vocal blogger who eloquently reminds us to …
[…] Please Avoid Functional Vomit
Read the full article here:
https://dzone.com/articles/using-java-8-please-avoid-functional-vomit
What’s the post really about?
It is about
naming things.
As we all know:
There are only two hard things in Computer Science: cache invalidation and naming things.
— Phil Karlton
Now, for some reason, there is a group of people who wants constant pain and suffering by explicitly naming
everything, including rather abstract concepts and algorithmic components, such as compound predicates. Those people like nominal typing and all the features that are derived from it. What is nominal typing (as opposed to structural typing)?
Structural typing
SQL is a good example to study the two worlds. When you write SQL statements, you’re creating structural
row types all the time. For instance, when you write:
SELECT first_name, last_name
FROM customer
… what you’re really doing is you’re creating a new rowtype of the structure (in pseudo-SQL):
TYPE (
first_name VARCHAR,
last_name VARCHAR
)
The type has the following properties:
- It is a tuple or record (as always in SQL)
- It contains two attributes or columns
- Those two attributes / columns are called
first_name
and last_name
- Their types is
VARCHAR
This is a
structural type, because the SQL statement that produces the type only declares the type’s structure implicitly, by producing a set of column expressions.
In Java, we know lambda expressions, which are (incomplete) structural types, such as:
// A type that can check for i to be even
i -> i % 2 == 0
Nominal typing
Nominal typing takes things one step further. In SQL, nominal typing is perfectly possible as well, for instance, in the above statement, we selected from a well-known table by name
customer
. Nominal typing assigns a name to a structural type (and possibly stores the type somewhere, for reuse).
If we want to name our
(first_name, last_name)
type, we could do things like:
-- By using a derived table:
SELECT *
FROM (
SELECT first_name, last_name
FROM customer
) AS people
-- By using a common table expression:
WITH people AS (
SELECT first_name, last_name
FROM customer
)
SELECT *
FROM people
-- By using a view
CREATE VIEW people AS
SELECT first_name, last_name
FROM customer
In all cases, we’ve assigned the name
people
to the structural type
(first_name, last_name)
. The only difference being the scope for which the name (and the corresponding content) is defined.
In Java, we can only use lambda expressions, once we assign them to a typed name, either by using an assignment, or by passing the expression to a method that takes a named type argument:
// Naming the lambda expression itself
Predicate<Integer> p = i -> i % 2 == 0
// Passing the lambda expression to a method
Stream.of(1, 2, 3)
.filter(i -> i % 2 == 0);
Back to the article
The article claims that giving a name to things is
always better. For instance, the author proposes giving a name to what we would commonly refer to as a “predicate”:
//original, less clear code
if(barrier.value() > LIMIT && barrier.value() > 0){
//extracted out to helper function. More code, more clear
if(barrierHasPositiveLimitBreach()){
So, the author thinks that extracting a rather trivial predicate into an external function is better because a future reader of such code will better understand what’s going on. At least in the article’s opinion. Let’s refute this claim for the sake of the argument:
- The proposed name is verbose and requires quite some thinking.
- What does breach mean?
- Is breach the same as
>=
or the same as >
?
- Is LIMIT a constant? From where?
- Where is barrier? Who owns it?
- What does the verb “has” mean, here? Does it depend on something outside of barrier? E.g. some shared state?
- What happens if there’s a negative limit?
By naming the predicate (remember, naming things is hard), the OP has added several layers of cognitive complexity to the reader, while quite possibly introducing subtle bugs, because probably both
LIMIT
and
barrier
should be function arguments, rather than global (im)mutable state that is assumed to be there, by the function.
The name introduced several concepts (“to have a breach”, “positive limit”, “breach”) that are not well defined and need some deciphering. How do we decipher it? Probably by looking inside the function and reading the actual code. So what do we gain? Better reuse, perhaps? But is this really reusable?
Finally, there is a (very slight) risk of introducing a performance penalty by the additional indirection. If we translate this to SQL, we could have written a stored function and then queried:
SELECT *
FROM orders -- Just an assumption here
WHERE barrier_has_positive_limit_breach(orders.barrier)
If this was some really complicated business logic depending on a huge number of things, perhaps extracting the function might’ve been worthwile. But in this particular case, is it really better than:
SELECT *
FROM orders
WHERE barrier > :limit AND barrier > 0
or even
SELECT *
FROM orders
WHERE barrier > GREATEST(:limit, 0)
Conclusion
There are some people in our industry who constantly want to see the world in black and white. As soon as they’ve had one small success story (e.g. reusing a very common predicate 4-5 times by extracting it into a function), they conclude with a
general rule of this approach being
always superior.
They struggle with the notion of “it depends”. Nominal typing and structural typing are both very interesting concepts. Structural typing is extremely powerful, whereas nominal typing helps us humans keep track of complexity. In SQL, we’ve always liked to structure our huge SQL statements, e.g. in nameable views. Likewise, Java programmers structure their code in nameable classes and methods.
But it should be immediately clear to anyone reading the
linked article that the author seems to like hyperboles and probably wasn’t really serious, given the silly example he came up with. The message he’s conveying is wrong, because it claims that naming things is
always better. It’s not true.
Be pragmatic. Name things where it really helps. Don’t name things where it doesn’t. Or as Leon Bambrick amended Phil Karlton’s quote:
There are only two hard things in Computer Science: cache invalidation, naming things, and off-by-one errors
Here’s my advice to you, dear nominal typing loving blogger. There’s are only two ways of typing: nominal typing and structural typing. And it depends typing.
Like this:
Like Loading...
Couldn’t agree more. Moderation in all things. Mathematics has come a long way without putting mind-deadening CamelCaseVerbosity everywhere. Use names for cross-reference purposes or the application of the “don’t repeat yourself” principle, otherwise, contain yourself. Unfortunately, the programming language may be grody enough to not let you (that’s you, Java!)
The desire to go all out on verbose names seem to be related to the idea of “self-documenting” code, whereas we have code that supposedly does not need to be documented because the reviewer can immediately find out how it works by looking at the “meaningful names”. It’s a bad and impractical idea that must have come from people just leaving university.
Anyway, what could be clearer than this quicksort in Haskell (from http://rosettacode.org/wiki/Quicksort#Haskell)
qsort [] = []
qsort (x:xs) = qsort [y | y <- xs, y < x] ++ [x] ++ qsort [y | y = x]
One quibble:
“In Java, we know lambda expressions, which are (incomplete) structural types, such as:”
Actually, I would say they are “arrow types” or functions (i.e. a way to describe how an input value should be transformed into an output value) instead of structural types (which are cartesian products or similar of other types).
Quite so. However, the real problem with the original – admittedly silly – example is, in my view, that the name for the predicate is an extremely poor choice. The choice is bad because it just tries to repeat implementation detail. That leads to most of the drawbacks that you mention. And more: what happens to your function name when the logic changes? But perhaps (it depends, of course) a name could have been chosen that would really have made the intention behind the code clearer. Something like, say, “if( orderWeightExceedsTransportCapacity() ) {…}” which for all its verbosity would at least tell a story, and indicate a possible locus of future change. Such a name also introduces new concepts and “layers of cognitive complexity”. But that is not a valid objection per se, when the concepts have explanatory power.
I also suggest to Yatima that the Haskell code is so obvious in part because everyone knows that “qsort” can only be a Quicksort implementation. When you already know what those xs and ys ought to be doing, of course you don’t need a name to help you figure it out.
Finally, I think it misleading to say that lambdas are structural types, incomplete or not. Lambda
expressions are instances of functional interfaces, which makes them nominal function types pure and simple.
All you’re essentially saying is you’re confirming that naming things is hard. And because time is money, we can postpone the decision to name things, and just let er go and do something else. Right? RIGHT? :)
Nope. The expression is nothing real / tangible (the expression). Only when you have a target type, it becomes that type’s implementation. I like Yatima’s correction saying that a lambda expression is … well … a function.
Right. I completely agree with doing more useful things. I only wanted to point out that the example you were arguing against was something of a straw man, because it would be considered bad even from the viewpoint of a person (like me) who tends towards naming many things that others find unspeakable…
Where do you draw the line?
That depends, of course. It’s really mostly a matter of taste. By and large, anything that’s an understandable abstraction (makes sense on its own) should have a name. That simplifies thinking about its functional role, and its relation to other entities.
So, in essence, you can’t draw the line :)
Both structural and nominal typing has pros and cons. As far as lambdas Java now has some loose structural typing due to lambda support but is mainly nominal. It is akin to Go’s languages interface. I’m actually not sure what you call that kind of polymorphism (ie nominal declarations where what is passed it doesn’t need to be associated). Some call it duck typing or protocol typing. Regardless I wouldn’t Java lambda support pure nominal typing either.
No, I still take exception to speaking of “structural typing” with regard to lambda expressions. Of course I know what you mean, and it is certainly helpful to think of lambdas as functions. But as Lukas said, without a target type they cannot be type-checked. Therefore they essentially have no type in Java. Structural types are quite another thing, and you will only muddy the waters by saying that the typing of lambdas in Java has anything to do with that, because it hasn’t. The typing of lambda expressions in Java is purely nominal.
As authority for this I cite Brian Goetz, who stated the following on Sep. 17 2013 on lambda-dev: “While you may reasonably regret the lack of structural function types, the next best thing is to really stick to the decision to embrace nominal function types”.
As an example, I think if lambdas were structurally typed, you wouldn’t have to do silly things like Predicate p = f::apply, where f is of type Function and more.
What happens underneath and what is apparent to the developer are two separate things. If it were truly nominal typing the developer would be required to implement the functional interface. I understand Java will coerce the expression to be that interface and that the expression doesn’t really exist but to the developer they did not have to reference something named. This discussion is in large part based on what your definition is of nominal typing which in some ways is just like ” what is strongly typed”… it can be fairly nebulous.
As I said earlier: “As far as lambdas Java now has some loose structural typing due to lambda support but is mainly nominal”. All I’m saying is IMO it isn’t complete nominal typing from the perspective of the developer. If anything it is compiler duck typing (albeit I really hate using that phrase) or super duper syntactic sugar.