The DBMS of the Year 2013

We have recently blogged about the DB-Engines Ranking and how MongoDB was the only NoSQL store to make it into that ranking’s top 10. Today, this marketing platform offered by solid IT has announced MongoDB to be the DBMS of the year 2013, with PostgreSQL being a close runner-up, followed by Cassandra.

solid IT as a company is slightly biased towards NoSQL, so it’s not surprising that two NoSQL databases are in their top ranking, and the only successful ORDBMS in the market is number two. As we ourselves are “slightly” biased towards SQL, we would like to announce our own DBMS of the year 2013:

SQL Server is the DBMS of the year 2013

… because its SQL dialect Transact-SQL (which Microsoft “shares” with Sybase), is the first SQL-based programming language to make it into TIOBE’s top 10 programming languages.

Congratulations to SQL Server from the jOOQ team!

The SQL Language’s Most Missing Feature

SQL is also awesome in many ways. We can write out the most complex truths and facts and have the database tell us the answer in no time.

But the SQL language is arguably the most beautiful programming language out there. It has so many caveats that people like me get ridiculously rich selling consulting services just to merely explain its semantics. One of the best examples of how twisted the SQL language is, is Alex Bolenok’s article about SQL NULL on Tech.Pro.

Now, one of the biggest criticisms of SQL is its verbosity. Most SQL dialects have virtually no constructs to avoid repetition. It is a common use case in SQL to filter, group, select, and order by the same expression. An example:

SELECT   first_name || ' ' || last_name
FROM     customers
WHERE    first_name || ' ' || last_name LIKE 'L% E%'
GROUP BY first_name || ' ' || last_name
ORDER BY first_name || ' ' || last_name

You might write a different query for the same, sure. But even if you didn’t doesn’t it bother you that there is hardly any way to re-use the first_name || ' ' || last_name concatenation expression?

Well, you can at least reuse it in the ORDER BY clause:

SELECT   first_name || ' ' || last_name AS name
FROM     customers
WHERE    first_name || ' ' || last_name LIKE 'L% E%'
GROUP BY first_name || ' ' || last_name
ORDER BY name

And in MySQL, you can also reuse it in the GROUP BY clause, although we think that this is a bad idea:

SELECT   first_name || ' ' || last_name AS name
FROM     customers
WHERE    first_name || ' ' || last_name LIKE 'L% E%'
GROUP BY name
ORDER BY name

Standard SQL solutions for column reuse

Sure, you could create a derived table:

SELECT   name
FROM (
  SELECT first_name || ' ' || last_name
  FROM   customers
) c(name)
WHERE    name LIKE 'L% E%'
GROUP BY name
ORDER BY name

… or a common table expression, which is basically a reusable derived table, or a local view:

WITH c(name) AS (
  SELECT first_name || ' ' || last_name
  FROM   customers
)
SELECT   name
FROM     c
WHERE    name LIKE 'L% E%'
GROUP BY name
ORDER BY name

But why do I even have to create a table to reuse / rename a simple column expression? Why can’t I just simply use something like a common column expression? Like this?

-- Common column expression that can be appended
-- to any table expression.
SELECT   name
FROM     customers
WITH     name AS first_name || ' ' || last_name
WHERE    name LIKE 'L% E%'
GROUP BY name
ORDER BY name

Note that the common column expression would be scoped to the table source. In other words, it only makes sense in the context of a FROM clause and the tables specified in the FROM clause. In a way, the following two expressions would be exactly equivalent:

-- Proposed syntax
FROM     customers
WITH     name AS first_name || ' ' || last_name

-- Existing syntax
FROM (
  SELECT customers.*,
         first_name || ' ' || last_name AS name
  FROM   customers
) AS customers

The proposed syntax could also be applied to joined tables:

-- Proposed syntax
FROM customers
  AS c
WITH name AS c.first_name || ' ' || c.last_name
JOIN addresses
  AS a
WITH address AS a.street || '\n' || a.city
  ON c.address_id = a.address_id
WITH full_address AS c.name || '\n' || a.address

-- Or alternatively, if you don't need to tightly 
-- scope these common column expressions to their
-- corresponding tables:
FROM customers AS c
JOIN addresses AS a
  ON c.address_id = a.address_id
WITH name AS c.first_name || ' ' || c.last_name,
     address AS a.street || '\n' || a.city,
     full_address AS name || '\n' || address

So in plain English, the new WITH clause could be appended to any type of table expression. With parentheses and comments, the first among the above examples would read:

FROM (
  customers AS c
  -- The "name" column is appended to "c"
  WITH name AS c.first_name || ' ' || c.last_name
)
JOIN (
  addresses AS a
  -- The "address" column is appended to "a"
  WITH address AS a.street || '\n' || a.city
) ON c.address_id = a.address_id
-- The "full_address" column is appended to the
-- above joined table expression
WITH full_address AS c.name || '\n' || a.address

The above syntax would then again be exactly equivalent to this:

FROM (
  SELECT 
    c.*, a.*, 
    c.name || '\n' || a.address AS full_address
  FROM (
    SELECT c.*, 
           c.first_name || ' ' || c.last_name
    FROM customers AS c
  ) c
  JOIN (
    SELECT a.*,
           a.street || '\n' || a.city
    FROM addresses AS a
  ) a
  ON c.address_id = a.address_id
)

SQL’s most missing language feature

Since SQL:1999, we luckily have common table expressions – the WITH clause that can be prepended to any SELECT statement (and to other DML statements in some dialects). Using common table expressions, we can avoid repeating commonly used derived tables.

But such a thing is not possible for columns. The one language feature SQL is in most dire need of are thus common column expressions.

Why Did SQLJ Die?

Every now and then, SQLJ pops up somewhere, mostly in a very dusty/enterprisey or in an academic context.

If you give SQLJ some thought, though, it isn’t such a bad idea. It is:

  • An ANSI and ISO standard
  • Part of the SQL standard
  • Quite easy to understand
  • Quite a powerful extension to JDBC

So why did it die (or rather, why did it never really take off)? This question was asked on Stack Overflow, and we gave an answer.

Let’s assume that you have already decided to embed your SQL (as opposed to externalising it through a templating mechanism, hiding it with an ORM, or with stored procedures). Here are a couple of reasons why SQLJ is not an optimal solution for embedding SQL:

IDE support

While Pro*C worked well for C and C++ in the 90s, Java really took off in the early 2000’s. With Java, there were also an increasing number of powerful IDEs such as Eclipse, NetBeans, JBuilder, and others. Java preprocessors and IDEs have never become friends, though, as parsing one language is hard enough. Parsing (and providing tooling) for two languages is much harder.

In fact, SQLJ made the surrounding Java code type-unsafe as IDEs and compilers couldn’t process .sqlj files before they had been pre-processed.

SQL popularity

There was a time when people started thinking that SQL itself was going to be dead. First, they did so with the advent of ORMs, then they did so with the advent of NoSQL. People thought that the DBA is dead. Again.

Well, this has been proven to be wrong a couple of times, but certainly not because of SQLJ.

Typesafety

In the late 2000’s, there had now been typesafe alternatives to SQLJ, such as jOOQ in Java, or LINQ-to.SQL in .NET, which leverage IDE features such as syntax autocompletion. By being internal domain-specific languages / query DSLs, these APIs not only bring typesafety to embedded SQL, but they also allow for dynamic SQL building, which SQLJ doesn’t support.

Predictions

While embedding SQL into other languages is a useful thing, SQLJ never solved this problem adequately. Hence, R.I.P., SQLJ

Why Your Boring Data Will Outlast Your Sexy New Technology

So you’re playing around with all those sexy new technologies, enjoying yourself, getting inspiration from state-of-the-art closure / lambda / monads and other concepts-du-jour

Now that I have your attention provoking a little anger / smirk / indifference, let’s think about the following. I’ve recently revisited a great article by Ken Downs written in 2010. There’s an excellent observation in the middle.

[…] in fact there are two things that are truly real for a developer:

  • The users, who create the paycheck, and
  • The data, which those users seemed to think was supposed to be correct 100% of the time.

He says this in the context of a sophisticated rant against ORMs. It can also be seen as a rant against any recent abstraction or database technology with a mixture of nostalgia and cynicism. But the essence of his article lies in another observation that I’ve made with so many software systems:

  • Developers tend to get bored with “legacy”
  • Systems tend to outlast their developers
  • Data tends to outlast the systems operating on it

Having said so, we can quickly understand why Java has become the new COBOL to some, who are bored with its non-sexiness (specifically in the JEE corner). Yet, according to TIOBE, Java and C are still the most widely used programming languages in the market and many systems are thus implemented in Java.

On the other hand, many inspiring new technologies have emerged in the last couple of years, also in the JVM field. A nice overview is given in ZeroTurnaround’s The Adventurous Developer’s Guide to JVM Languages. And all of that is fine to a certain extent. As Bruno Borges from Oracle recently and adequately put it:

anything not mainstream has more odds to be “sexy” [than JSF]

Now, let’s map this observation back to a subsequent section of Ken’s article:

[…] the application code suddenly becomes a go-between, the necessary appliance that gets data from the db to the user […] and takes instructions back from the user and puts them in the database […]. No matter how beautiful the code was, the user would only ever see the screen […] and you only heard about it if it was wrong. Nobody cares about my code, nobody cares about yours.

Think about an E-Banking system. None of your users really cares about the code you wrote to get it running. What they care about is their data (i.e. their money) and the fact that it is correct. This is true for many many systems where the business logic and UI layers can be easily replaced with fancy new technology, whereas the data stays around.

In other words, you are free to choose whatever sexy new UI technology you like as long as it helps your users get access to their data.

So what about sexy new database technology?

Now, that’s an entirely different story, compared to sexy new UI technology.

You might be considering some NoSQL-solution-du-jour to “persist” your data, because it’s so easy and because it costs so much less. Granted, the cost factor may seem very tempting at first. But have you considered the fact that:

  • Systems tend to outlast their developers
  • Data tends to outlast the systems operating on it

Once your data goes into that NoSQL store, it may stay there much longer than you had wanted it to. Your developers and architects (who originally chose this particular NoSQL solution) may have left long ago. Parts of your system may have been replaced, too, because now you’re doing everything in HTML5. JavaScript is the new UI technology.

And all this time, you have “persisted” UI / user / domain model data in your database, from various systems, from various developers, through various programming paradigms. And then, you realise:

We’re not saying that there aren’t some use-cases where NoSQL databases really provide you with a better solution than the “legacy” alternatives. Most specifically, graph databases solve a problem that no RDBMS has really solved well, yet.

But consider one thing. You will have to migrate your data. Time and again. You will have to archive it. And maybe, migrate the archive. And maybe provide reports of the archive. And provide correct reports (which means: ACID!) And be transactional (which means: ACID!) And all the other things that people do with data.

In fact, your system will grow like any other system ever did before and it will have high demands from your database. While some NoSQL databases have started to get some decent traction, in a way that it is safe to say their vendors might still be around in 5-10 years, when the systems will have been replaced by developers who have replaced other developers.

In other words, there is one more bullet to this list:

  • Developers tend to get bored with “legacy”
  • Systems tend to outlast their developers
  • Data tends to outlast the systems operating on it
  • Data might tend to outlast the vendors providing tools for operating the data

Beware of the fact that data will probably outlast your sexy new technology. So, choose wisely when thinking about sexy new technology to operate your data.