The Inconvenient Truth About Dynamic vs. Static Typing

Sometimes there are these moments of truth. They happen completely unexpectedly, such as when I read this tweet:

David is the author of the lesser-known but not at all lesser-interesting Whiley programming language, a language that has a lot of static type checking built in it. One of the most interesting features of the Whiley language is flow sensitive typing (sometimes also simply called flow typing), which is mostly useful when used along with union types. An example from the getting started guide

function indexOf(string str, char c) => null|int:

function split(string str, char c) => [string]:
  var idx = indexOf(str,c)

  // idx has type null|int
  if idx is int:

    // idx now has type int
    string below = str[0..idx]
    string above = str[idx..]
    return [below,above]

  else:
    // idx now has type null
    return [str] // no occurrence

Remember, other languages like Ceylon also know flow-sensitive typing, and even Java does to a certain extent, because Java has union types, too!

try {
    ...
}
catch (SQLException | IOException e) {
    if (e instanceof SQLException)
        doSomething((SQLException) e);
    else
        doSomethingElse((IOException) e);
}

Granted, Java’s flow-sensitive typing is explicit and verbose. We could expect the Java compiler to infer all the types. The following should type-check and compile just as well:

try {
    ...
}
catch (SQLException | IOException e) {
    if (e instanceof SQLException)
        // e is guaranteed to be of type SQLException
        doSomething(e);
    else
        // e is guaranteed to be of type IOException
        doSomethingElse(e);
}

Flow typing or flow sensitive typing means that the compiler can infer the only possible type from the control flow of the surrounding program. It is a relatively new concept in modern languages like Ceylon, and it makes static typing extremely powerful, especially if the language also supports sophisticated type inference via var or val keywords!

JavaScript static typing with Flow

Let’s get back to David’s Tweet and have a look at what the article said about Flow:

http://sitr.us/2014/11/21/flow-is-the-javascript-type-checker-i-have-been-waiting-for.html

The presence of a use of length with a null argument informs Flow that there should be a null check in that function. This version does type-check:

function length(x) {
  if (x) {
    return x.length;
  } else {
    return 0;
  }
}

var total = length('Hello') + length(null);

Flow is able to infer that x cannot be null inside the if body.

That’s quite cunning. A similar upcoming feature can be observed in Microsoft’s TypeScript. But Flow is different (or claims to be different) from TypeScript. The essence of Facebook Flow can be seen in this paragraph from the official Flow announcement:

Flow’s type checking is opt-in — you do not need to type check all your code at once. However, underlying the design of Flow is the assumption that most JavaScript code is implicitly statically typed; even though types may not appear anywhere in the code, they are in the developer’s mind as a way to reason about the correctness of the code. Flow infers those types automatically wherever possible, which means that it can find type errors without needing any changes to the code at all. On the other hand, some JavaScript code, especially frameworks, make heavy use of reflection that is often hard to reason about statically. For such inherently dynamic code, type checking would be too imprecise, so Flow provides a simple way to explicitly trust such code and move on. This design is validated by our huge JavaScript codebase at Facebook: Most of our code falls in the implicitly statically typed category, where developers can check their code for type errors without having to explicitly annotate that code with types.

Let this sink in

most JavaScript code is implicitly statically typed

again

JavaScript code is implicitly statically typed

Yes!

Programmers love type systems. Programmers love to reason formally about their data types and put them in narrow constraints to be sure the program is correct. That’s the whole essence of static typing: To make less mistakes because of well-designed data structures.

People also love to put their data structures in well-designed forms in databases, which is why SQL is so popular and “schema-less” databases will not gain more market share. Because in fact, it’s the same story. You still have a schema in a “schema-less” database, it’s just not type checked and thus leaves you all the burden of guaranteeing correctness.

On a side note: Obviously, some NoSQL vendors keep writing these ridiculous blog posts to desperately position their products, claiming that you really don’t need any schema at all, but it’s easy to see through that marketing gag. True need for schemalessness is as rare as true need for dynamic typing. In other words, when is the last time you’ve written a Java program and called every method via reflection? Exactly…

But there’s one thing that statically typed languages didn’t have in the past and that dynamically typed languages did have: Means to circumvent verbosity. Because while programmers love type systems and type checking, programmers do not love typing (as in typing on the keyboard).

Verbosity is the killer. Not static typing

Consider the evolution of Java:

Java 4

List list = new ArrayList();
list.add("abc");
list.add("xyz");

// Eek. Why do I even need this Iterator?
Iterator iterator = list.iterator();
while (iterator.hasNext()) {
    // Gee, I *know* I only have strings. Why cast?
    String value = (String) iterator.next();

    // [...]
}

Java 5

// Agh, I have to declare the generic type twice!
List<String> list = new ArrayList<String>();
list.add("abc");
list.add("xyz");

// Much better, but I have to write String again?
for (String value : list) {
    // [...]
}

Java 7

// Better, but I still need to write down two
// times the "same" List type
List<String> list = new ArrayList<>();
list.add("abc");
list.add("xyz");

for (String value : list) {
    // [...]
}

Java 8

// We're now getting there, slowly
Stream.of("abc", "xyz").forEach(value -> {
    // [...]
});

On a side-note, yes, you could’ve used Arrays.asList() all along.

Java 8 is still far from perfect, but things are getting better and better. The fact that I finally do not have to declare a type anymore in a lambda argument list because it can be inferred by the compiler is something really important for productivity and adoption.

Consider the equivalent of a lambda pre-Java 8 (if we had Streams before):

// Yes, it's a Consumer, fine. And yes it takes Strings
Stream.of("abc", "xyz").forEach(new Consumer<String>(){
    // And yes, the method is called accept (who cares)
    // And yes, it takes Strings (I already say so!?)
    @Override
    public void accept(String value) {
        // [...]
    }
});

Now, if we’re comparing the Java 8 version with a JavaScript version:

["abc", "xyz"].forEach(function(value) {
    // [...]
});

We have almost reached as little verbosity as the functional, dynamically typed language that is JavaScript (I really wouldn’t mind those missing list and map literals in Java), with the only difference that we (and the compiler) know that value is of type String. And we know that the forEach() method exists. And we know that forEach() takes a function with one argument.

In the end of the day, things seem to boil down to this:

Dynamically typed languages like JavaScript and PHP have become popular mainly because they “just ran”. You didn’t have to learn all the “heavy” syntax that classic statically typed languages required (just think of Ada and PL/SQL!). You could just start writing your program. Programmers “knew” that the variables would contain strings, there’s no need to write it down. And that’s true, there’s no need to write everything down!

Consider Scala (or C#, Ceylon, pretty much any modern language):

val value = "abc"

What else can it be, other than a String?

val list = List("abc", "xyz")

What else can it be, other than a List[String]?

Note that you can still explicitly type your variables if you must – there are always those edge cases:

val list : List[String] = List[String]("abc", "xyz")

But most of the syntax is “opt-in” and can be inferred by the compiler.

Dynamically typed languages are dead

The conclusion of all this is that once syntactic verbosity and friction is removed from statically typed languages, there is absolutely no advantage in using a dynamically typed language. Compilers are very fast, deployment can be fast too, if you use the right tools, and the benefit of static type checking is huge. (don’t believe it? read this article)

As an example, SQL is also a statically typed language where much of the friction is still created by syntax. Yet, many people believe that it is a dynamically typed language, because they access SQL through JDBC, i.e. through type-less concatenated Strings of SQL statements. If you were writing PL/SQL, Transact-SQL, or embedded SQL in Java with jOOQ, you wouldn’t think of SQL this way and you’d immediately appreciate the fact that your PL/SQL, Transact-SQL, or your Java compiler would type-check all of your SQL statements.

So, let’s abandon this mess that we’ve created because we’re too lazy to type all the types (pun). Happy typing!

And if you’re reading this, Java language expert group members, please do add var and val, as well as flow-sensitive typing to the Java language. We’ll love you forever for this, promised!

Stop Claiming that you’re Using a Schemaless Database

One of MongoDB’s arguments when evangelising MongoDB is the fact that MongoDB is a “schemaless” database:

Why Schemaless?

MongoDB is a JSON-style data store. The documents stored in the database can have varying sets of fields, with different types for each field.

And that’s true. But it doesn’t mean that there is no schema. There are in fact various schemas:

  • The one in your head when you designed the data structures
  • The one that your database really implemented to store your data structures
  • The one you should have implemented to fulfill your requirements

Every time you realise that you made a mistake (see point three above), or when your requirements change, you will need to migrate your data. Let’s review again MongoDB’s point of view here:

With a schemaless database, 90% of the time adjustments to the database become transparent and automatic. For example, if we wish to add GPA to the student objects, we add the attribute, resave, and all is well — if we look up an existing student and reference GPA, we just get back null. Further, if we roll back our code, the new GPA fields in the existing objects are unlikely to cause problems if our code was well written.

Everything above is true as well.

“Schema-less” vs. “Schema-ful”

But let’s translate this to SQL (or use any other “schema-ful” database instead):

ALTER TABLE student ADD gpa VARCHAR(10);

And we’re done! Gee, we’ve added a column, and we’ve added it to ALL rows. It was transparent. It was automatic. We “just get back null” on existing students. And we can even “roll back our code”:

ALTER TABLE student DROP gpa;

Not only are the existing objects unlikely to cause problems, we have actually rolled back our code AND database.

Let’s summarise:

  • We can do exactly the same in “schema-less” databases as we can in “schema-ful” ones
  • We guarantee that a migration takes place (and it’s instant, too)
  • We guarantee data integrity when we roll back the change

What about more real-world DDL?

Of course, at the beginning of projects, when they still resemble the typical cat/dog/pet-shop, book/author/library sample application, we’ll just be adding columns. But what happens if we need to change the student-teacher 1:N relationship into a student-teacher M:N relationship? Suddenly, everything changes, and not only will the relational data model prove far superior to a hierarchical one that just yields tons of data duplication, it’ll also be moderately easy to migrate, and the outcome is guaranteed to be correct and tidy!

CREATE TABLE student_to_teacher 
AS
SELECT id AS student_id, teacher_id
FROM student;

ALTER TABLE student DROP teacher_id;

… and we’re done! (of course, we’d be adding constraints and indexes)

Think about the tedious task that you’ll have transforming your JSON to the new JSON. You don’t even have XSLT or XQuery for the task, only JavaScript!

Let’s face the truth

Schemalessness is about a misleading term as much as NoSQL is:

And again, MongoDB’s blog post is telling the truth (and an interesting one, too):

Generally, there is a direct analogy between this “schemaless” style and dynamically typed languages. Constructs such as those above are easy to represent in PHP, Python and Ruby. What we are trying to do here is make this mapping to the database natural.

When you say “schemaless”, you actually say “dynamically typed schema” – as opposed to statically typed schemas as they are available from SQL databases. JSON is still a completely schema free data structure standard, as opposed to XML which allows you to specify XSD if you need, or operate on document-oriented, “schema-less” (i.e. dynamically typed) schemas.

(And don’t say there’s json-schema. That’s a ridiculous attempt to mimick XSD)

This is important to understand! You always have a schema, even if you don’t statically type it. If you’re writing JavaScript, you still have types, which you have to be fully aware of in your mental model of the code. Except that there’s no compiler (or IDE) that can help you infer the types with 100% certainty.

An example:

… and more:

So, there’s absolutely nothing that is really easier with “schemaless” databases than with “schemaful” ones. You just defer the inevitable work of sanitising your schema to some other later time, a time when you might care more than today, or a time when you’re lucky enough to have a new job and someone else does the work for you. You might have believed MongoDB, when they said that “objects are unlikely to cause problems”.

But let me tell you the ugly truth:

Anything that can possibly go wrong, does

Murphy

We wish you good luck with your dynamically typed languages and your dynamically typed database schemas – while we’ll stick with type safe SQL.

jOOQ: The best way to write SQL in Java

How Nashorn Impacts API Evolution on a New Level

Following our previous article about how to use jOOQ with Java 8 and Nashorn, one of our users discovered a flaw in using the jOOQ API as discussed here on the user group. In essence, the flaw can be summarised like so:

Java code

package org.jooq.nashorn.test;

public class API {
    public static void test(String string) {
        throw new RuntimeException("Don't call this");
    }

    public static void test(Integer... args) {
        System.out.println("OK");
    }
}

JavaScript code

var API = Java.type("org.jooq.nashorn.test.API");
API.test(1); // This will fail with RuntimeException

After some investigation and the kind help of Attila Szegedi, as well as Jim Laskey (both Nashorn developers from Oracle), it became clear that Nashorn disambiguates overloaded methods and varargs differently than what an old Java developer might expect. Quoting Attila:

Nashorn’s overload method resolution mimics Java Language Specification (JLS) as much as possible, but allows for JavaScript-specific conversions too. JLS says that when selecting a method to invoke for an overloaded name, variable arity methods can be considered for invocation only when there is no applicable fixed arity method.

I agree that variable arity methods can be considered only when there is no applicable fixed arity method. But the whole notion of “applicable” itself is completely changed as type promotion (or coercion / conversion) using ToString, ToNumber, ToBoolean is preferred over what intuitively appear to be “exact” matches with varargs methods!

Let this sink in!

Given that we now know how Nashorn resolves overloading, we can see that any of the following are valid workarounds:

Explicitly calling the test(Integer[]) method using an array argument:

This is the simplest approach, where you ignore the fact that varargs exist and simply create an explicit array

var API = Java.type("org.jooq.nashorn.test.API");
API.test([1]);

Explicitly calling the test(Integer[]) method by saying so:

This is certainly the safest approach, as you’re removing all ambiguity from the method call

var API = Java.type("org.jooq.nashorn.test.API");
API["test(Integer[])"](1);

Removing the overload:

public class AlternativeAPI1 {
    public static void test(Integer... args) {
        System.out.println("OK");
    }
}

Removing the varargs:

public class AlternativeAPI3 {
    public static void test(String string) {
        throw new RuntimeException("Don't call this");
    }

    public static void test(Integer args) {
        System.out.println("OK");
    }
}

Providing an exact option:

public class AlternativeAPI4 {
    public static void test(String string) {
        throw new RuntimeException("Don't call this");
    }

    public static void test(Integer args) {
        test(new Integer[] { args });
    }

    public static void test(Integer... args) {
        System.out.println("OK");
    }
}

Replacing String by CharSequence (or any other “similar type”):

Now, this is interesting:

public class AlternativeAPI5 {
    public static void test(CharSequence string) {
        throw new RuntimeException("Don't call this");
    }

    public static void test(Integer args) {
        System.out.println("OK");
    }
}

Specifically, the distinction between CharSequence and String types appears to be very random from a Java perspective in my opinion.

Agreed, implementing overloaded method resolution in a dynamically typed language is very hard, if even possible. Any solution is a compromise that will introduce flaws at some ends. Or as Attila put it:

As you can see, no matter what we do, something else would suffer; overloaded method selection is in a tight spot between Java and JS type systems and very sensitive to even small changes in the logic.

True! But not only is overload method selection very sensitive to even small changes. Using Nashorn with Java interoperability is, too! As an API designer, over the years, I have grown used to semantic versioning, and the many subtle rules to follow when keeping an API source compatible, behavior compatible – and if ever possible – to a large degree also binary compatible.

Forget about that when your clients are using Nashorn. They’re on their own. A newly introduced overload in your Java API can break your Nashorn clients quite badly. But then again, that’s JavaScript, the language that tells you at runtime that:

['10','10','10','10'].map(parseInt)

… yields

[10, NaN, 2, 3]

… and where

++[[]][+[]]+[+[]] === "10"

yields true! (sources here)

For more information about JavaScript, please visit this introductory tutorial

Java 8 Friday: JavaScript goes SQL with Nashorn and jOOQ

At Data Geekery, we love Java. And as we’re really into jOOQ’s fluent API and query DSL, we’re absolutely thrilled about what Java 8 will bring to our ecosystem.

Java 8 Friday

Every Friday, we’re showing you a couple of nice new tutorial-style Java 8 features, which take advantage of lambda expressions, extension methods, and other great stuff. You’ll find the source code on GitHub.

JavaScript goes SQL with Nashorn and jOOQ

This week, we’ll look into some awesome serverside SQL scripting with Nashorn and Java 8. Only few things can be found on the web regarding the use of JDBC in Nashorn. But why use JDBC and take care of painful resource management and SQL string composition, when you can use jOOQ? Everything works out of the box!

Let’s set up a little sample JavaScript file as such:

var someDatabaseFun = function() {
    var Properties = Java.type("java.util.Properties");
    var Driver = Java.type("org.h2.Driver");

    var driver = new Driver();
    var properties = new Properties();

    properties.setProperty("user", "sa");
    properties.setProperty("password", "");

    try {
        var conn = driver.connect(
            "jdbc:h2:~/test", properties);

        // Database code here
    }
    finally {
        try { 
            if (conn) conn.close();
        } catch (e) {}
    }
}

someDatabaseFun();

This is pretty much all you need to interoperate with JDBC and a H2 database. So we could be running SQL statements with JDBC like so:

try {
    var stmt = conn.prepareStatement(
        "select table_schema, table_name " + 
        "from information_schema.tables");
    var rs = stmt.executeQuery();

    while (rs.next()) {
        print(rs.getString("TABLE_SCHEMA") + "."
            + rs.getString("TABLE_NAME"))
    }
}
finally {
    if (rs)
        try {
            rs.close();
        }
        catch(e) {}

    if (stmt)
        try {
            stmt.close();
        }
        catch(e) {}
}

Most of the bloat is JDBC resource handling as we unfortunately don’t have a try-with-resources statement in JavaScript. The above generates the following output:

INFORMATION_SCHEMA.FUNCTION_COLUMNS
INFORMATION_SCHEMA.CONSTANTS
INFORMATION_SCHEMA.SEQUENCES
INFORMATION_SCHEMA.RIGHTS
INFORMATION_SCHEMA.TRIGGERS
INFORMATION_SCHEMA.CATALOGS
INFORMATION_SCHEMA.CROSS_REFERENCES
INFORMATION_SCHEMA.SETTINGS
INFORMATION_SCHEMA.FUNCTION_ALIASES
INFORMATION_SCHEMA.VIEWS
INFORMATION_SCHEMA.TYPE_INFO
INFORMATION_SCHEMA.CONSTRAINTS
...

Let’s see if we can run the same query using jOOQ:

var DSL = Java.type("org.jooq.impl.DSL");

print(
    DSL.using(conn)
       .fetch("select table_schema, table_name " +
              "from information_schema.tables")
);

This is how you can execute plain SQL statements in jOOQ, with much less bloat than with JDBC. The output is roughly the same:

+------------------+--------------------+
|TABLE_SCHEMA      |TABLE_NAME          |
+------------------+--------------------+
|INFORMATION_SCHEMA|FUNCTION_COLUMNS    |
|INFORMATION_SCHEMA|CONSTANTS           |
|INFORMATION_SCHEMA|SEQUENCES           |
|INFORMATION_SCHEMA|RIGHTS              |
|INFORMATION_SCHEMA|TRIGGERS            |
|INFORMATION_SCHEMA|CATALOGS            |
|INFORMATION_SCHEMA|CROSS_REFERENCES    |
|INFORMATION_SCHEMA|SETTINGS            |
|INFORMATION_SCHEMA|FUNCTION_ALIASES    |
 ...

But jOOQ’s strength is not in its plain SQL capabilities, it lies in the DSL API, which abstracts away all the vendor-specific SQL subtleties and allows you to compose queries (and also DML) fluently. Consider the following SQL statement:

// Let's assume these objects were generated
// by the jOOQ source code generator
var Tables = Java.type(
    "org.jooq.db.h2.information_schema.Tables");
var t = Tables.TABLES;
var c = Tables.COLUMNS;

// This is the equivalent of Java's static imports
var count = DSL.count;
var row = DSL.row;

// We can now execute the following query:
print(
    DSL.using(conn)
       .select(
           t.TABLE_SCHEMA, 
           t.TABLE_NAME, 
           c.COLUMN_NAME)
       .from(t)
       .join(c)
       .on(row(t.TABLE_SCHEMA, t.TABLE_NAME)
           .eq(c.TABLE_SCHEMA, c.TABLE_NAME))
       .orderBy(
           t.TABLE_SCHEMA.asc(),
           t.TABLE_NAME.asc(),
           c.ORDINAL_POSITION.asc())
       .fetch()
);

Note that there is obviously no typesafety in the above query, as this is JavaScript. But I would imagine that the IntelliJ, Eclipse, or NetBeans creators will eventually detect Nashorn dependencies on Java programs, and provide syntax auto-completion and highlighting, as some things can be statically analysed.

Things get even better if you’re using the Java 8 Streams API from Nashorn. Let’s consider the following query:

DSL.using(conn)
   .select(
       t.TABLE_SCHEMA,
       t.TABLE_NAME,
       count().as("CNT"))
   .from(t)
   .join(c)
   .on(row(t.TABLE_SCHEMA, t.TABLE_NAME)
       .eq(c.TABLE_SCHEMA, c.TABLE_NAME))
   .groupBy(t.TABLE_SCHEMA, t.TABLE_NAME)
   .orderBy(
       t.TABLE_SCHEMA.asc(),
       t.TABLE_NAME.asc())

// This fetches a List<Map<String, Object>> as
// your ResultSet representation
   .fetchMaps()

// This is Java 8's standard Collection.stream()
   .stream()

// And now, r is like any other JavaScript object
// or record!
   .forEach(function (r) {
       print(r.TABLE_SCHEMA + '.' 
           + r.TABLE_NAME + ' has ' 
           + r.CNT + ' columns.');
   });

The above generates this output:

INFORMATION_SCHEMA.CATALOGS has 1 columns.
INFORMATION_SCHEMA.COLLATIONS has 2 columns.
INFORMATION_SCHEMA.COLUMNS has 23 columns.
INFORMATION_SCHEMA.COLUMN_PRIVILEGES has 8 columns.
INFORMATION_SCHEMA.CONSTANTS has 7 columns.
INFORMATION_SCHEMA.CONSTRAINTS has 13 columns.
INFORMATION_SCHEMA.CROSS_REFERENCES has 14 columns.
INFORMATION_SCHEMA.DOMAINS has 14 columns.
...

If your database supports arrays, you can even access such array columns by index, e.g.

r.COLUMN_NAME[3]

So, if you’re a server-side JavaScript aficionado, download jOOQ today, and start writing awesome SQL in JavaScript, now! For more Nashorn awesomeness, consider reading this article here.

Stay tuned for more awesome Java 8 content on this blog.

Squel – A SQL Query Builder for JavaScript

… Yes! You’ve read correctly. For JavaScript. OK, there has been quite a bit traction around server-side JavaScript through node.js. The brave ones among you brave enough to actually write JavaScript, writing SQL in JavaScript might seem like a good idea, then. So I have discovered this library called squel.js, which has a nice-looking GitHub-style website and a big fat disclaimer almost at the top:

NOTE: It is recommended that you do NOT create queries browser-side to run on the server as this massively increases your exposure to SQL Injection attacks.

Again. If such a disclaimer needs to be added at the top of your website, is it really a good idea to proceed, then? But it may be for the node.js folks. So let’s have a look at the syntax of Squel.

squel.select().from("students")

Does this look familiar? So far, it could also be jOOQ code. With this SQL builder API, you can also create select from derived tables:

alert(
    squel.select()
        .from(squel.select().from('students'), 's')
        .field('s.id')
);
/* SELECT s.id FROM (SELECT * FROM students) `s` */

Or perform JOINs:

alert(
    squel.select()
        .field("students.id")
        .from("students")
        .left_join("teachers", null, 
             "students.id = teachers.student_id")
        .right_join("jailed", "j", 
             "j.student_id = students.id")
);
/*  SELECT students.id FROM students
        LEFT JOIN teachers 
        ON (students.id = teachers.student_id)
        RIGHT JOIN jailed `j` 
        ON (j.student_id = students.id)
*/

Obviously, unlike Java SQL builders, this API is not typesafe, but it’s still interesting to see fluent APIs in other languages as well.

10 Reasons not to Choose a Particular Open Source software

We’re all Software Engineers of one type or another. Most of us have one thing in common, though: We’re lazy. And we know that someone else was less lazy and has already solved that tedious problem that we’re on. And because we’re not only lazy but also stingy, we search for Free Open Source software.

But the problem with Open Source software is: There are millions of options for about every problem domain. Just look at web development with “modern” JavaScript. Which tool to choose? Which one will still be there tomorrow? Will it work? Will I get maintenance? New features? Plugins from the community?

While it is not so easy to find the right tool among the good ones (commons or guava? mockito or jmock? Hibernate or jOOQ or MyBatis?), it is certainly easier to rule out the bad ones.

Here are some things to look out for, when evaluating an Open Source software (in no particular order)

1. NullPointerExceptions, ClassCastExceptions

This is one of my favourites. It is very easy to google. No one is completely safe from these annoyances. But when you find stack traces, bug reports, investigate them closely.

  • Do they appear often?
  • Do they appear in similar contexts?
  • Do they appear in places where they could’ve been omitted?

It’s a matter of good design to be able to avoid NullPointerExceptions and ClassCastExceptions. It happens to everyone. But no NullPointerException should be thrown from a place that can be statically discovered by the Java compiler or with FindBugs.

Needless to say that the list of no-go exceptions thrown from a database library, for instance, can be extended with SQLExceptions due to syntax errors produced by that library.

2. Community Discussing Bugs instead of Features, Strategies, Visions

Every Open Source software has users and with Google Groups and GitHub, it has become fairly easy to interact with an OSS community.

For larger projects, the community also extends to Stack Overflow, Reddit, Twitter, etc. These next steps are a sign of popularity of an Open Source software, but not necessarily a sign that you should use them. Also, don’t be blinded by users saying “hey this is so cool”, “it just made my day”, “best software ever”. They say that to everyone who’s supporting them out of their misery (or laziness, see the intro of this post).

What you should be looking out for is whether the community is discussing visions, strategies, features, truly awesome ideas that can be implemented next year, in the next major release. It’s a true sign that not only the software will probably stick around, it will also become much better.

The converse to this is a community that mainly discusses bugs (see NullPointerException, ClassCastException). Unlike a “visionary” community, a “buggy” community will just create work, not inspiration to the vendor. But which one’s the chicken, which one’s the egg?

Another converse to this is a community that is disappointed by the false promises given by the visions of the vendor. I often have a feeling that Scala’s SLICK might qualify for that as it introduces an insurmountable language-mapping impedance mismatch between its own, LINQ-inspired querying DSL and SQL.

3. Poor Manual, Poor Javadoc

That’s easy to discover. Do you really want that? The best and most authoritative information should come from the software vendor, not some weirdo forum on the web that you’ve googled.

A good example are PostgreSQL’s Manuals.

A rant about bad examples can be seen here:
http://www.cforcoding.com/2009/08/its-time-we-stopped-rewarding-projects.html

Don’t be deceived by the idea that it might get better eventually. Poorly documented software will be poor in many other aspects.  And it’s such an easy thing to discover!

Of course, the “right” amount of documentation is an entirely other story…

4. No Semantic Versioning

Search for release notes and see if you’ll find something that roughly corresponds to semver.org. You will want patch releases when your Open Source software that you’re using in mission-critical software fails. When you get a patch release, you don’t want 50 new features (with new NullPointerExceptions, ClassCastExceptions).

5. Unorganised Appearance

Again, we’re in times of GitHub. The good old CVS times are over, where HTML was still used to share cooking recipes. Check if your Open Source software uses those tools. If they show that they’re using them. It will help you ascertain that the software will still be good in a couple of years if the vendor isn’t crushed by the mess they’ve gotten themselves in.

6. Vendor Side-Project evolving into an Offspring Product

Now that is a sign not everyone may agree upon, I guess. But after the experience I’ve made in previous jobs, I strongly believe that software that has evolved out of necessity before making it a product really suffers from its legacy. It wasn’t a product from the beginning and it has strong ties to the vendor’s original requirements, which doesn’t bother the vendor, but it will bother you. And because the vendor still has very strong ties to their offspring, they won’t be ready to make fundamental changes in both code and vision!

Specifically, in the database field, there are a couple of these software, e.g.

Note, I don’t know any of the above tools, so they may as well be awesome. But be warned. They weren’t designed as products. They were designed for a very narrow purpose originating from a pre-Apache context.

7. Generics are Poorly (or Overly) Adopted

Generics were introduced in 2004 with Java 5. Now that the heated debates about generic type erasure are over, generics are well adopted. Or aren’t they? The latest stable release 3.2.1 of Apache Commons Collections is still not generified! That must’ve been the number 1 reason why people had started shifting to Google Guava (or its predecessors) instead. There’s not much making for a lousier day than having raw types (or eels) slapped around your face.

The other thing that you should look out for, though, is over-generification. Generics can become really hard, even for top-shot Java architects. A common blunder is to strongly correlate subtype polymorphism with generic polymorphism without being aware of the effects. Having too many generics in an API is a good sign for an architecture astronaut. (or a design astronaut in this case). We’ll see further down how that may correlate with the person behind the design decisions.

8. Vendor Cannot Handle Objective Criticism or Competition

Here’s how to find out, who’s behind the Open Source software. While this isn’t important for a small, geeky tool, you should be very interested in the vendor as a person when looking for a strategic OSS addition, especially if you’re dealing with a benevolent dictator. The vendor should be:

  • Aware of competition, i.e. they’re doing marketing, learning from them. Improving to compete. This means that they are interested in being truly better, not just “convinced that they’re better”.
  • Open minded with their competition, with you as a customer, and ready to discuss various points of view.
  • Interested in new ideas, possibly putting them on a roadmap right away (but without losing focus for his main strategic roadmap).

Even if this is Open Source, there’s no point in being arrogant or conceited. The vendor should treat you like a customer (as long as you’re not trolling). Open-mindedness will eventually lead to the better product in the long run.

9. Vendor has no Commercial or Marketing Interests at All

Now, (Free) Open Source is nice for many reasons. As a vendor, you get:

  • Feedback more quickly
  • Feedback more often
  • Community (with pull requests, feature additions, etc.)
  • The feeling that you’re doing something good

True? Yes. But that’s true for commercial software as well. So what’s the real reason for doing Open Source? It depends. Adobe for instance has started opening up a lot, recently, since their acquisition of Day Software. All of JCR, JackRabbit, the upcoming JackRabbit Oak, Sling and Felix are still at Apache with the original committers still on board. But one can certainly not say that Adobe has no commercial interests.

OSS vendors should think economically and build products. Eventually, they may start selling stuff around their core products, or separate community and commercial licenses. And unlike they get too greedy (see Oracle and MySQL, vs RedHat and MariaDB), that can make commercial Open Source a very interesting business, also for the customer who will then get the good parts of Open Source (partially free, open, with a vibrant community) along with the good parts of commercial software (premium support, warranties, etc.)

In other words, don’t choose overly geeky stuff. But you might have recognised those tools before (poor documentation, no semantic versioning, poor tooling).

10. No Traction Anymore

To wrap this up, here’s an obvious last one. Many Open Source products don’t show any traction by the vendor. That goes along well with the previous point, where the vendor has no commercial interest. Without commercial long-term interest, they’ll also lose all other interest. And you’re stuck with maintaining a pile of third-party code yourself (fixing its many many ClassCastExceptions, NullPointerExceptions).

TL;DR : Conclusion

You should chose Open Source just like commercial software. Economically.

  • Open Source is not an excuse for bad quality.
  • Open Source is not an excuse for lack of support.
  • Open Source is not an excuse for non-professionalism.

If Open Source fails you on any of the above, the joke will be on you, the customer. You’ll get a bad product, and you’ll pay the price with exaggerated maintenance on your side, which you thought you’d avoid by chosing something free. Nothing is free. Not even Free Open Source. Ask the Grumpy Nerd