Could we Have a Language That Hides Collections From Us?

I just fixed a bug. The fix required me to initialise an Object[] array with the init values for each type, instead of just null, i.e. false for boolean, 0 for int, 0.0 for double, etc. So, instead of just doing:

Object[] converted = new Object[parameterTypes.length];

I needed:

Object[] converted = new Object[parameterTypes.length];

for (int i = 0; i < converted.length; i++)
    converted[i] = Reflect.initValue(parameterTypes[i]);

For the subjective 8E17th time, I wrote a loop. A loop that did nothing interesting other than call a method for each of the looped structure’s elements. And I felt the pain of our friend Murtaugh

Why do we distinguish between T and T[]?

What I really wanted to do is this. I have a method Reflect.initValue()

public static <T> T initValue(Class<T> type) {}

What I really want to do is this, in one way or another:

converted = initValue(parameterTypes);

(Yes, there are subtleties that need to be thought about, such as should this init an array or assign values to an array. Forget about them for now. Big picture first).

The point is, no one enjoys writing loops. No one enjoys writing map/flatMap either:

Stream.of(parameterTypes)
      .map(Reflect::initValue)
      .toArray(converted);

It’s so much useless, repetitive, infrastructural ceremony that I don’t enjoy writing nor reading. My “business logic” here is simply

converted = initValue(parameterTypes);

I have 3 elements:

  • A source data structure parameterTypes
  • A target data structure converted
  • A mapping function initValue

That’s all I should be seeing in my code. All the infrastructure of how to iterate is completely meaningless and boring.

SQL joins

In fact, SQL joins are often the same. We use primary key / foreign key relationships, so the path between parent and child tables is very obvious in most cases. Joins are cool, relational algebra is cool, but in most cases, it just gets in the way of writing understandable business logic. In my opinion, this is one of Hibernate’s biggest innovations (probably others did this too, perhaps even before Hibernate): implicit joins, which jOOQ copied.

There’s much ceremony in writing this:

SELECT
  cu.first_name,
  cu.last_name,
  co.country
FROM customer AS cu
JOIN address USING (address_id)
JOIN city USING (city_id)
JOIN country AS co USING (country_id)

When this alternative, intuitive syntax would be much more convenient:

SELECT
  cu.first_name,
  cu.last_name,
  cu.address.city.country.country
FROM customer AS cu

It is immediately clear what is meant by the implicit join syntax. The syntactic ceremony of writing the explicit joins is not necessary.

Again, joins are really cool, and power users will be able to use them when needed. E.g. the occasional NATURAL FULL OUTER JOIN can still be done! But let’s admit it, 80% of all joins are boring, and could be replaced with the above syntax sugar.

Suggestion for Java

Of course, this suggestion will not be perfect, because it doesn’t deal with the gazillion edge cases of introducing such a significant feature to an old language. But again, if we allow ourselves to focus on the big picture, wouldn’t it be nice if we could:

class Author {
  String firstName;
  String lastName;
  Book[] books; // Or use any collection type here
}

class Book {
  String title;
}

And then:

Author[] authors = ...

// This...
String[] firstNames = authors.firstName;

// ...is sugar for this (oh how it hurts to type this):
String[] firstNames = new String[authors.length];
for (int i = 0; i < firstNames.length; i++)
    firstNames[i] = authors[i].firstName;

// And this...
int[] firstNameLengths = authors.firstName.length()

// ... is sugar for this:
int[] firstNameLengths = new int[authors.length];
for (int i = 0; i < firstNames.length; i++)
    firstNames[i] = authors[i].firstName.length();

// ... or even this, who cares (hurts to type even more):
int[] firstNameLengths = Stream
  .of(authors)
  .map(a -> a.firstName)
  .mapToInt(String::length)
  .toArray();

Ignore the usage of arrays, it could just as well be a List, Stream, Iterable, whatever data structure or syntax that allows to get from a 1 arity to an N arity.

Or to get a set of author’s books:

Author[] authors = ...
Book[] books = authors.books;

Could it mean anything other than that:

Stream.of(authors)
      .flatMap(a -> Stream.of(a.books))
      .toArray(Book[]::new);

Why do we have to keep spelling these things out? They’re not business logic, they’re meaningless, boring, infrastructure. While yes, there are surely many edge cases (and we could live with the occasional compiler errors, if the compiler can’t figure out how to get from A to B), there are also many “very obvious” cases, where the cerimonial mapping logic (imperative or functional, doesn’t matter) is just completely obvious and boring.

But it gets in the way of writing and reading, and despite the fact that it seems obvious in many cases, it is still error prone!

I think it’s time to revisit the ideas behind APL, where everything is an array, and by consequence, operations on arity 1 types can be applied to arity N types just the same, because the distinction is often not very useful.

Bonus: Null

While difficult to imagine retrofitting a language like Java with this, a new language could do away with nulls forever, because the arity 0-1 is just a special case of the arity N: An empty array.

Looking forward to your thoughts.

Why I Completely Forgot That Programming Languages Have While Loops

I’ve recently made an embarassing discovery:

Yes. In all of my professional work with PL/SQL (and that has been quite a bit, in the banking industry), I have never really used a WHILE loop – at least not that I recall. The idea of a WHILE loop is simple, and it is available in most languages, like PL/SQL:

WHILE condition
LOOP
   {...statements...}
END LOOP;

Or Java:

while (condition)
    statement

So, why have I simply never used it?

Most loops iterate on collections

In hindsight, it’s crazy to think that it took Java until version 5 to introduce the foreach loop:

String[] strings = { "a", "b", "c" };
for (String s : strings)
    System.out.println(s);

It is some of Java’s most useful syntax sugar for the equivalent loop that uses a local loop variable that we simply don’t care about:

String[] strings = { "a", "b", "c" };
for (int i = 0; i < strings.length; i++)
    System.out.println(strings[i]);

Let’s be honest. When do we really want this loop variable? Hardly ever. Sometimes, we need to access the next string along with the current one, and we want to stick to the imperative paradigm for some reason (when we could do it more easily with functional programming APIs). But that’s it. Most loops simply iterate the entire collection in a very dumb and straightforward way.

SQL is all about collections

In SQL, everything is a table (see SQL trick #1 in this article), just like in relational algebra, everything is a set.

Now, PL/SQL is a useful procedural language that “builds around” the SQL language in the Oracle database. Some of the main reasons to do things in PL/SQL (rather than e.g. in Java) are:

  • Performance (the most important reason), e.g. when doing ETL or reporting
  • Logic needs to be “hidden” in the database (e.g. for security reasons)
  • Logic needs to be reused among different systems that all access the database

Much like Java’s foreach loop, PL/SQL has the ability to define implicit cursors (as opposed to explicit ones)

As a PL/SQL developer, when I want to loop over a cursor, I have at least these options:

Explicit cursors

DECLARE
  -- The loop variable
  row all_objects%ROWTYPE;

  -- The cursor specification
  CURSOR c IS SELECT * FROM all_objects;
BEGIN
  OPEN  c;
  LOOP
    FETCH c INTO row;
    EXIT WHEN c%NOTFOUND;
    dbms_output.put_line(row.object_name);
  END LOOP;
  CLOSE c;
END;

The above would correspond to the following boring Java code that we wrote time and again prior to Java 5 (in fact, without the generics):

Iterator<Row> c = ... // SELECT * FROM all_objects
while (c.hasNext()) {
    Row row = c.next();
    System.out.println(row.objectName);
}

The while loop is absolutely boring to write. Just like with the loop variable, we really don’t care about the current state of the iterator. We want to iterate over the whole collection, and at each iteration, we don’t care where we’re currently at.

Note that in PL/SQL, it is common practice to use an infinite loop syntax and break out of the loop when the cursor is exhausted (see above). In Java, this would be the corresponding logic, which is even worse to write:

Iterator<Row> c = ... // SELECT * FROM all_objects
for (;;) {
    if (!c.hasNext())
        break;
    Row row = c.next();
    System.out.println(row.objectName);
}

Implicit cursors

Here’s how many PL/SQL developers do things most of the time:

BEGIN
  FOR row IN (SELECT * FROM all_objects)
  LOOP
    dbms_output.put_line(row.object_name);
  END LOOP;
END;

The cursor is really an Iterable in terms of Java collections. An Iterable is a specification of what collection (Iterator) will be produced when the control flow reaches the loop. I.e. a lazy collection.

It’s very natural to implement external iteration in the above way.

If you’re using jOOQ to write SQL in Java (and you should), you can apply the same pattern in Java as well, as jOOQ’s ResultQuery type extends Iterable, which means it can be used as an Iterator source in Java’s foreach loop:

for (AllObjectsRecord row : ctx.selectFrom(ALL_OBJECTS))
    System.out.println(row.getObjectName());

Yes, that’s it! Focus on the business logic only, which is the collection specification (the query) and what you do with each row (the println statement). None of that cursor noise!

OK, but why no WHILE?

If you love SQL as much as me, you probably do that because you like the idea of having a declarative programming language to declare sets of data, just like SQL does. If you write client code in PL/SQL or Java, you will thus like to continue working on the entire data set and continue thinking in terms of sets. The imperative programming paradigm that operates on the intermediate object, the cursor, is not your paradigm. You don’t care about the cursor. You don’t want to manage it, you don’t want to open / close it. You don’t want to keep track of its state.

Thus, you will choose the implicit cursor loop, or the foreach loop in Java (or some Java 8 Stream functionality).

As you do that more often, you will run into less and less situations where the WHILE loop is useful. Until you forget about its mere existence.

WHILE LOOP, you won’t be missed.