Interesting insight on SQL Server’s enhancement can be seen in this blog post by Microsoft’s Nicolas Bruno, who challenges the fact that column stores cannot be implemented by “traditional RDBMS”. As Nicolas Bruno stated, an “Old Elephant” can be taught new tricks. “Traditional RDBMS” have proven to adapt to long-term trends in the database industry. Their success isn’t based around the fact that they are mainly fast, or particularly well-designed to respond to niche problem domains. Their success is mainly based on the fact that they are designed according to Codd’s 12 Rules, and thus to be extremely flexible in how they separate data interfacing (SQL) from data storage.
A lot of additional insight and ongoing links can be found in these blog posts by Daniel Lemire, where he had challenged Stonebraker’s similar claims already four years ago:
As a NewSQL vendor also actively involved with H-Store, he is of course heavily yet refreshingly biased towards traditional RDBMS storage models being obsolete (an interesting fact is that Oracle Labs representative Eric Sedlar also attended the talk. One might think that the talk was a slighly FUD-dy move against a VoltDB competitor). Unlike what has come to be known as the NoSQL movement, NewSQL relies on similar relational theory / set theory as “traditional SQL”, including support for ACID and structured data.
His claims mainly include that:
OLAP / data warehouses will migrate to column-based data stores within 10 years. The traditional row-based data storage approach is dead, as row-based storage will never match column-based storage’s performance increase by factor 100x.
For OLTP, the race for the best data storage designs has not yet been decided, but there is a clear indication of classic models being “plain wrong” (according to Stonebraker), as only 4% of wall-clock time is spent on useful data processing, while the rest is occupied with buffer pools, locking, latching, recovery.
I specifically recommend the OLTP part of his talk, as it shows how various new techniques could heavily increase performance of traditional RDBMS already today:
Most OLTP systems can afford to buy the amount of memory needed to keep data off the disk. This will remove the need for a buffer pool.
Single-threading would get rid of the latching overhead. H-Store and VoltDB statically divide shared memory among the cores, for instance. This is very important as latching gets worse and worse with the increasing amount of cores we have, today.
Dynamic locking is not really implemented in any popular RDBMS, but the market is uncertain, which workaround best implements concurrency control. In his opinion, MVCC is not going to do the trick in the long run.
In a cluster, active-active consistency management can increase log throughput by factor 3x, compared to active-passive logging. (active-active = transaction is run on every node, active-passive = transaction is run only on the master node, the log is sent to all slave nodes)
And also, very importantly, anti-caching is a good technique when the in-memory format matches the disk format, as traditional RDBMS spend a substantial amount of time converting disk data formats (blocks, sectors) into memory formats (actual data).
The essence of Stonebraker’s talk is that the “elephants” who currently dominate the market are too slow to react to all the NewSQL vendors’ innovations. It is a very exciting time for a database professional (some refer to them as data geeks) to enter the market and publish new findings.
Another interesting thing to note is that SQL (call it NewSQL, OldSQL) will remain a dominant language for querying DBMS, both for column-stores as for row-stores. This is a strong statement for tools like jOOQ, which embrace SQL as a first-class citizen among programming languages.
See the complete talk by Michael Stonebraker here: