The Premature Return to SQL

In online communities, the NoSQL topic (much like the ORM topic) is a guarantee to stir emotions. Many emotions are stirred by evangelists on either side for ideological or marketing reasons. Here’s an interesting post by Alex Popescu, a passionate NoSQL and polyglot persistence evangelist, claiming that the recent trend to return to SQL is premature:

This post triggered an equally interesting reaction by Markus Winand, author of SQL Performance Explained:

It’s really interesting, how often people think in terms of “trends” that introduce novel paradigms, obsoleting all we had before. I believe that these are not trends, but experiments. I’ve blogged before that you should be wary when NoSQL vendors promise you to put an end to DBAs. Very few “new” solutions or paradigms have ever completely replaced or substituted their predecessors. Or, in Isaac Newton’s words:

If I have seen further it is by standing on the shoulders of giants.

We’re not “returning to SQL”, nor is such a return “premature”. Yes, there are some innovative thinkers who are teaching an old elephant new tricks, and that’s good. It’s also good that such innovative thinkers get a piece of the cake and make money with their inventions.

It is also true that big database vendors are not very innovative. But they don’t have to be. Their asset is reliability, predictability, stability. Oracle SQL will still support all its age-old legacy in 15 years, which makes it a safe choice for banks and insurance companies. If a NoSQL or NewSQL feature proves to be innovative and reliable, Oracle et al. will most certainly pick it up and integrate it into SQL. Clever NoSQL vendors thus already prepare for their exits.

This happens outside the world of databases, of course:

  • Scala is innovative and contributes to Java (Generics in Java 5, Lambdas in Java 8).
  • Open Source developers (e.g. those of JAX-RS) are innovative and contribute to JEE.
  • PostgreSQL is innovative and contributes to other SQL dialects and eventually the SQL standard.
  • Instagram is innovative and contributed to Facebook (“shit happens!”).
  • jOOQ is innovative and contributes to JDBC and JPA (eventually, hopefully).

SQL is a safe bet and is here to stay.

MIT Prof. Michael Stonebraker: “The Traditional RDBMS Wisdom is All Wrong”

A very interesting talk about the future of DBMS was recently given at EPFL by MIT Professor and VoltDB Co-founder and CTO Michael Stonebraker, who also gave us Ingres and Postgres. In a bit less than one hour, he explains his views with respect to the three main pillars of database management systems:

  • OLAP / Data warehouses
  • OLTP
  • Other types of data stores

As a NewSQL vendor also actively involved with H-Store, he is of course heavily yet refreshingly biased towards traditional RDBMS storage models being obsolete (an interesting fact is that Oracle Labs representative Eric Sedlar also attended the talk. One might think that the talk was a slighly FUD-dy move against a VoltDB competitor). Unlike what has come to be known as the NoSQL movement, NewSQL relies on similar relational theory / set theory as “traditional SQL”, including support for ACID and structured data.

His claims mainly include that:

  • OLAP / data warehouses will migrate to column-based data stores within 10 years. The traditional row-based data storage approach is dead, as row-based storage will never match column-based storage’s performance increase by factor 100x.
  • For OLTP, the race for the best data storage designs has not yet been decided, but there is a clear indication of classic models being “plain wrong” (according to Stonebraker), as only 4% of wall-clock time is spent on useful data processing, while the rest is occupied with buffer pools, locking, latching, recovery.
Image from Stonebraker's presentation depicting the amount of "useful" work performed by any RDBMS
Image from Stonebraker’s presentation depicting the amount of “useful” work performed by any RDBMS

I specifically recommend the OLTP part of his talk, as it shows how various new techniques could heavily increase performance of traditional RDBMS already today:

  • Most OLTP systems can afford to buy the amount of memory needed to keep data off the disk. This will remove the need for a buffer pool.
  • Single-threading would get rid of the latching overhead. H-Store and VoltDB statically divide shared memory among the cores, for instance. This is very important as latching gets worse and worse with the increasing amount of cores we have, today.
  • Dynamic locking is not really implemented in any popular RDBMS, but the market is uncertain, which workaround best implements concurrency control. In his opinion, MVCC is not going to do the trick in the long run.
  • ACIDness is something that even Jeff Dean from Google admits to miss, once it’s gone, as eventual consistency does not really keep its promise.
  • In a cluster, active-active consistency management can increase log throughput by factor 3x, compared to active-passive logging. (active-active = transaction is run on every node, active-passive = transaction is run only on the master node, the log is sent to all slave nodes)
  • And also, very importantly, anti-caching is a good technique when the in-memory format matches the disk format, as traditional RDBMS spend a substantial amount of time converting disk data formats (blocks, sectors) into memory formats (actual data).

The essence of Stonebraker’s talk is that the “elephants” who currently dominate the market are too slow to react to all the NewSQL vendors’ innovations. It is a very exciting time for a database professional (some refer to them as data geeks) to enter the market and publish new findings.

Another interesting thing to note is that SQL (call it NewSQL, OldSQL) will remain a dominant language for querying DBMS, both for column-stores as for row-stores. This is a strong statement for tools like jOOQ, which embrace SQL as a first-class citizen among programming languages.

See the complete talk by Michael Stonebraker here:

See Stonebraker's Talk here: http://slideshot.epfl.ch/play/suri_stonebraker
See Stonebraker’s Talk here: http://slideshot.epfl.ch/play/suri_stonebraker

Further reading:

A map of all those new NoSQL, NewSQL, post-SQL, structured, unstructured database options that came out over the past year

So you want to go with the flow and implement your next application on top of some NoSQL, NotJustSQL, NewSQL, AlmostSQL, SQL++, NextGenSQL, and what not, just to be sure not to miss out on some of the latest developments in the data business? Here’s a little map to guide you through the jungle of choices:

http://gigaom.com/cloud/confused-by-the-glut-of-new-databases-heres-a-map-for-you/

… or you just stick with the relational data model and some decent RDBMS like Oracle, SQL Server, or Postgres and wait until things settle a little bit :-)