The 10 Most Popular DB Engines (SQL and NoSQL)

How to objectively measure the popularity of a DB engine? Good question! And there’s an Austrian company (Solid IT) who claims to have the answer. The company focuses on “Big Data und NoSQL“, but this focus does not seem to have biased the result of the measurement. Among the top 10 database engines, there is only MongoDB, which is not an RDBMS. And it’s astonishing just how popular MongoDB seems to be (although, they must be doing something right)!

Reproduced with permission of DB-Engines.com
Reproduced with permission of DB-Engines.com

Now, I’m not surprised by the top 3. I am definitely surprised by the fact that PostgreSQL and SQLite are not more popular. I am also surprised, that there aren’t more “wide-column stores” among the top 10. Maybe, Michael Stonebraker has to review his claims about the traditional RDBMS wisdom being all wrong?

And what about the other databases supported by jOOQ? Where are the Java databases? Here’s a condensed view of the ranking, consisting only of the 15 databases currently supported by jOOQ 3.1:

dbms-ranking-jooq
Reproduced with permission of DB-Engines.com

It turns out that Java databases (Derby, H2, HyperSQL) are not so popular compared to all the others. It also turns out that MariaDB still has a lot of grounds to gain, compared to MySQL.

The ranking considers a lot of data from various somewhat authoritative sources as is explained here. These include:

  • Number of mentions of the system on websites. Measured through search engine results.
  • General interest in the system. Measured through Google Trends.
  • Frequency of technical discussions about the system. Measured through Stack Overflow and similar.
  • Number of job offers, in which the system is mentioned. Measured through Indeed and similar.
  • Number of profiles in professional networks, in which the system is mentioned. Measured through LinkedIn.

This ranking is certainly something to keep an eye on!

Why do We Need RDBMS?

There was this recent Quora question about why we need RDBMS:

Why not just use text files? What can RDBMS do that a simple text file cannot? Or, why not use several different text files to represent different tables?

Heh. Let’s challenge that through a witty comparison (also given as an answer to the above question)…

Short story (not to be taken too seriously):

Some people just put their keys, wallets, make-up, letters, pencils, more make-up, change, and all the other stuff in a huge purse, spending hours to find stuff when we need to catch the train. Stuff, which they might have actually put in that other purse. Let’s call this purse the text file

I like to structure my stuff. My index says: Wallet in the back pocket, key in the front right pocket, mobile phone in the front left pocket, glasses on my nose. Let’s call this structure the RDBMS.

;-)

Long story:

This Quora question is really interesting in this context: Why did relational database technologies gain traction? What were the historical competing technologies?

Essentially, there had been a single most important driving force at the time, pushing RDBMS way ahead of all alternative storage models: Relational algebra itself, designed mostly by Edgar F. Codd, a brilliant computer scientist of his times.

Not only did popular relational database management systems take care of actually managing data, data structures, physical models, transactions, query models, a powerful query language (implemented as SQLjOOQJPQLLINQ to SQLand various other dialects / APIs), referential intergrity, constraint management etc, etc., they were also based on a very very powerful conceptual model and implementation rules (Codd’s 12 rules). The relational data model can easily model almost all business rules.

So, of course you can write your own data management system. Or you use a proven one that does millions of things for you according to very proven rules conceived by very bright people that got very rich with their systems.

jOOQ Newsletter September 2013

Subscribe to this newsletter here

SQL Popularity and Controversy

Together, the above articles have have reached out to more than 200’000 readers on the jOOQ Blog and through our syndication partners in only one month.

This has been topped by two more, very interesting articles about Prof. Michael Stonebraker’s recent claims. In “The Traditional RDBMS Wisdom is All Wrong”, Stonebraker, who has given us nothing less than Ingres, Postgres, Vertica, Streambase, Illustra, VoltDB, SciDB shows that he is also well-known for his stirring of controversy.

Read also the follow-up article Teaching an Old Elephant New Tricks, where Stonebraker’s claims are addressed by Oracle and SQL Server.

With Stonebraker’s (and others’) efforts to bring SQL and “NewSQL” back to the market, SQL’s popularity is bound to be on the rise again. Few other languages stir so many emotions when it comes to commenting on tutorials like 10 Easy Steps to a Complete Understanding of SQL, which triggered lots of discussion both on reddit, and on hackernews.

jOOQ Public Talks and Trainings

jOOQ responds to SQL’s popularity by embracing SQL as a first-class citizen into your stack. Curious about how jOOQ works, or how it could work in your organisation? Then join any of the upcoming events:

  • The jOOQ training sessions at the /ch/open workshop days in Zurich, Switzerland (in German)
  • The jOOQ introductory session at the JUGH in Kassel, Germany (in German)

Feel free to contact us at contact@datageekery.com if you’re interested in a Training session close to you.

Column Stores: Teaching an Old Elephant New Tricks

Prof. Michael Stonebraker is a controversial visionary, who is known for nothing less than Ingres, Postgres, Vertica, Streambase, Illustra, VoltDB, SciDB, besides being a renowned MIT professor. My recent blog post about Stonebraker’s talk at the EPFL (host university to Prof. Martin Odersky, creator of the Scala Language and Co-Founder of Typesafe) has triggered a very interesting discussion on reddit.

While Stonebraker is very sure about his obviously biased claims that “The Traditional RDBMS Wisdom is All Wrong”, the bottom line of the reddit discussion included:

Interesting insight on SQL Server’s enhancement can be seen in this blog post by Microsoft’s Nicolas Bruno, who challenges the fact that column stores cannot be implemented by “traditional RDBMS”. As Nicolas Bruno stated, an “Old Elephant” can be taught new tricks. “Traditional RDBMS” have proven to adapt to long-term trends in the database industry. Their success isn’t based around the fact that they are mainly fast, or particularly well-designed to respond to niche problem domains. Their success is mainly based on the fact that they are designed according to Codd’s 12 Rules, and thus to be extremely flexible in how they separate data interfacing (SQL) from data storage.

A lot of additional insight and ongoing links can be found in these blog posts by Daniel Lemire, where he had challenged Stonebraker’s similar claims already four years ago:

MIT Prof. Michael Stonebraker: “The Traditional RDBMS Wisdom is All Wrong”

A very interesting talk about the future of DBMS was recently given at EPFL by MIT Professor and VoltDB Co-founder and CTO Michael Stonebraker, who also gave us Ingres and Postgres. In a bit less than one hour, he explains his views with respect to the three main pillars of database management systems:

  • OLAP / Data warehouses
  • OLTP
  • Other types of data stores

As a NewSQL vendor also actively involved with H-Store, he is of course heavily yet refreshingly biased towards traditional RDBMS storage models being obsolete (an interesting fact is that Oracle Labs representative Eric Sedlar also attended the talk. One might think that the talk was a slighly FUD-dy move against a VoltDB competitor). Unlike what has come to be known as the NoSQL movement, NewSQL relies on similar relational theory / set theory as “traditional SQL”, including support for ACID and structured data.

His claims mainly include that:

  • OLAP / data warehouses will migrate to column-based data stores within 10 years. The traditional row-based data storage approach is dead, as row-based storage will never match column-based storage’s performance increase by factor 100x.
  • For OLTP, the race for the best data storage designs has not yet been decided, but there is a clear indication of classic models being “plain wrong” (according to Stonebraker), as only 4% of wall-clock time is spent on useful data processing, while the rest is occupied with buffer pools, locking, latching, recovery.
Image from Stonebraker's presentation depicting the amount of "useful" work performed by any RDBMS
Image from Stonebraker’s presentation depicting the amount of “useful” work performed by any RDBMS

I specifically recommend the OLTP part of his talk, as it shows how various new techniques could heavily increase performance of traditional RDBMS already today:

  • Most OLTP systems can afford to buy the amount of memory needed to keep data off the disk. This will remove the need for a buffer pool.
  • Single-threading would get rid of the latching overhead. H-Store and VoltDB statically divide shared memory among the cores, for instance. This is very important as latching gets worse and worse with the increasing amount of cores we have, today.
  • Dynamic locking is not really implemented in any popular RDBMS, but the market is uncertain, which workaround best implements concurrency control. In his opinion, MVCC is not going to do the trick in the long run.
  • ACIDness is something that even Jeff Dean from Google admits to miss, once it’s gone, as eventual consistency does not really keep its promise.
  • In a cluster, active-active consistency management can increase log throughput by factor 3x, compared to active-passive logging. (active-active = transaction is run on every node, active-passive = transaction is run only on the master node, the log is sent to all slave nodes)
  • And also, very importantly, anti-caching is a good technique when the in-memory format matches the disk format, as traditional RDBMS spend a substantial amount of time converting disk data formats (blocks, sectors) into memory formats (actual data).

The essence of Stonebraker’s talk is that the “elephants” who currently dominate the market are too slow to react to all the NewSQL vendors’ innovations. It is a very exciting time for a database professional (some refer to them as data geeks) to enter the market and publish new findings.

Another interesting thing to note is that SQL (call it NewSQL, OldSQL) will remain a dominant language for querying DBMS, both for column-stores as for row-stores. This is a strong statement for tools like jOOQ, which embrace SQL as a first-class citizen among programming languages.

See the complete talk by Michael Stonebraker here:

See Stonebraker's Talk here: http://slideshot.epfl.ch/play/suri_stonebraker
See Stonebraker’s Talk here: http://slideshot.epfl.ch/play/suri_stonebraker

Further reading: