The Crystal Ball. Or, Oops, Michael Stonebraker did it Again

Michael Stonebraker’s opinions and claims are always refreshing to read. He’s done a lot for our industry and for how we do data processing. Some of his claims are certainly right as well. Here’s an interview with him, telling us about his 5 predictions on the future of databases.

Of course, him being a software vendor, many of his claims should be read with caution. Today, the most popular DBMS (relational or not) are still Oracle, MySQL, and SQL Server. Even his “popular” PostgreSQL is still a niche player, let alone the almost forgotten Ingres and the never really popular Vertica columnar “NewSQL” database. Obviously, we’re not saying they’re bad databases, but they’re certainly not very popular. The same goes with SAP. Their Sybase databases have been surpassed by SQL Server both in quality and in popularity 10 years ago, when Microsoft forked the Sybase code into T-SQL. We hardly believe that Oracle and Sybase will have the “final fight” for RDBMS supremacy.

But again. That’s Mike Stonebraker, the salesman as in “The Traditional RDBMS Wisdom is All Wrong”

Counting Distinct Records in SQL

The SQL language and its depths… Some of you readers might be aware of MySQL’s capability of counting distinct records through the COUNT() aggregate function. The MySQL documentation reads:

COUNT(DISTINCT expr,[expr...])

Returns a count of the number of rows with different non-NULL expr values.

In other words, you can count distinct first and last names very easily:

SELECT COUNT(DISTINCT FIRST_NAME, LAST_NAME)
FROM CUSTOMERS

That’s quite useful, but MySQL-specific (although HSQLDB also supports this particular MySQL syntax). Most other databases, however, do not offer such a feature, even if the SQL-99 standard has specified it long ago:

6.16  <set function specification>

<set function specification> ::=
    COUNT <left paren> <asterisk> <right paren>
  | <general set function>

<general set function> ::=
    <set function type>
        <left paren> [ <set quantifier> ] 
            <value expression> <right paren>

<set function type> ::=
    <computational operation>

<computational operation> ::=
    AVG | MAX | MIN | SUM
  | EVERY | ANY | SOME
  | COUNT

<set quantifier> ::= DISTINCT | ALL

The above was later on merged into 10.9 <aggregate function>, in SQL:2011, with lots of other aggregate function types and features. Now, let’s have a look at 6.23 <value expression>:

6.23 <value expression>

<value expression> ::=
<numeric value expression>
| <string value expression>
| <datetime value expression>
| <interval value expression>
| <boolean value expression>
| <user-defined type value expression>
| <row value expression>                <-- RVE!
| <reference value expression>
| <collection value expression>

Interestingly, you can put a row value expression in some of your aggregate functions. There are additional restrictions, as you cannot SUM() or AVG() records, for instance. But with COUNT() and COUNT(DISTINCT ...) this makes perfect sense. So, according to the SQL standard (and according to HSQLDB’s alternative, and PostgreSQL’s main implementation, which unfortunately isn’t really documented), the following would be the correct way to count distinct first and last names in SQL:

SELECT COUNT(DISTINCT (FIRST_NAME, LAST_NAME))
FROM CUSTOMERS

… which makes perfect sense. Watch out when reading this. This isn’t a DISTINCT() function! This is the DISTINCT keyword applied to a row value expression. See some previous blog posts about row value expressions here:

jOOQ standardises various SQL dialects and hides the above behind the DSL.countDistinct() method.

When you’re not using HSQLDB, MySQL, or PostgreSQL

In other databases, you might resort to some manual SQL transformation. Our example is easy to transform:

SELECT COUNT(*)
FROM (
  SELECT DISTINCT FIRST_NAME, LAST_NAME
  FROM CUSTOMERS
) t

But such query transformation can prove to be tricky, once you have GROUP BY clauses or other aggregate functions involved. Future versions of jOOQ might do that sort of transformation for you.

CUBRID: A Lesser-Known Korean OSS Database Gem

While RedHat and Google have been dumping MySQL for MariaDB, there’s actually a third, much lesser-known option for MySQL-oriented database folks in the RDBMS market: CUBRID.

One of CUBRID’s main goals is also to lure MySQL users away from Oracle by offering many equivalent syntax elements that are available in either the MySQL or Oracle databases. But this gem from the far east, which is backed by the Korean Naver Corp. offers a lot more. It is one of the few Open Source RDBMS that

  • is also an ORDBMS (only PostgreSQL can compete with this)
  • implements the SQL standard MERGE statement (HSQLDB and Firebird do as well, Derby is currently implementing it)
  • implements a wide variety of window functions (only PostgreSQL competes, again)
  • implements a wide variety of MySQL’s proprietary SQL extensions
  • implements Oracle’s awesome CONNECT BY syntax (no other database does that)

Popularity-wise, there’s surely much to catch up with, comparing with MySQL and MariaDB:

Reproduced with permission by DB-Engines.com
Reproduced with permission by DB-Engines.com

But the next time you’re looking at a replacement for MySQL, why not also consider CUBRID?

jOOQ Newsletter September 17, 2013

Subscribe to this newsletter here.

SQL for calculations

SQL can be used for heavy calculations. This doesn’t mean that it has to, of course. Many Java-oriented software architects are reluctant to allow for business logic entering their database. DBA tend to disagree and promote complex logic in database views or stored procedures.

The pros and cons of both sides can be seen in this popular discussion on reddit. Our take on this discussion is a rather pragmatic one, saying: Neither approach is “better”. But not knowing about awesome, often vendor-specific SQL features is a pity. For example, here are two approaches to calculating a running total with Oracle SQL:

What is your experience with SQL for calculations? Reply to this e-mail and tell us your story.

SQL Performance Explained

SQL Performance ExplainedWhen running calculations in the database using SQL, it is always good to know your way around indexing and performance tuning. As jOOQ is a very SQL-centric environment, we would like to promote a very good book about SQL Performance.

SQL databases have become incredibly fast in parsing and executing even very complex SQL statements. But to many SQL developers, writing performing SQL is still a mystery. SQL Performance Explained by Markus Winandcovers 90% of what any SQL developer should know in very simple terms. A must-read for all SQL beginners and pros! Available in English, German, and French.

PostgreSQL 9.3

PostgreSQL 9.3 has been released! This is great news as a wonderful database has just gotten better. Apart from support for materialised views and updatable views, PostgreSQL now also supports the SQL standard LATERAL JOIN, which will soon be supported in jOOQ as well.

More about what’s new in PostgreSQL 9.3 can be seen here.

Google and MariaDB

After RedHat’s alleged switching from MySQL to MariaDB in RHEL, Google now also announces that they will start to migrate away from Oracle’s popular Open Source database. These migrations will further strengthen MariaDB’s position and communities, as large players in the data market will stop contributing to MySQL and contribute to MariaDB instead.

Read more about this migration here.

The Myth About Slow SQL JOIN Operations

In my recent SQL work for a large Swiss bank, I have maintained nested database view monsters whose unnested SQL code amounted up to 5k lines of code, joining the same table over and over again in separate subselects combined via UNION operations. This monster performed in way under 50ms, no matter how we queried it (see “10 more common mistakes” about the speed of queries). Of course, this performance was only achieved after lots of fine-tuning, load-testing and benchmarking. But it worked. Our Oracle database never let us down on these things.

Nonetheless, many SQL users think that JOIN operations are slow. Why? Perhaps because they are / used to be, in MySQL? I’m currently reading this interesting book by Markus Winand. The book is called SQL Performance Explained. He’s also the author of Use-The-Index-Luke.com where you can get free insight into his book. I still recommend reading the whole book, though. Even SQL old-timers and SQL nerds like me will find 1-2 novel, very interesting approaches, some of which will be incorporated into jOOQ very soon!

In particular, consider this page which explains very well how Hash JOIN operations work:
http://use-the-index-luke.com/sql/join/hash-join-partial-objects

MySQL Bad Idea #573

This is MySQL’s Bad Idea #573 (after #384, which I’ve blogged about before) I’ve just had a terrible experience with a bug report from the jOOQ User Group, related to escaping of backslashes in string literals in MySQL. First, I thought to myself, whatever. SQL doesn’t escape backslashes. The only escape character within a string literal according to the early SQL standards is the quote as in quote quote. Citing from SQL-1992 (slightly simplified):

<character string literal> ::=
    <quote> [ <character representation>... ] <quote>

<character representation> ::=
    <nonquote character>
  | <quote symbol><nonquote character> ::= !! See the Syntax Rules.
<quote symbol> ::= <quote><quote>

Alright? Crystal clear. There’s no escaping other than <quote><quote>

The only time when you will need to be able to escape something is with the LIKE predicate, in case you want to escape % and _ symbols. You can then use the ESCAPE clause:

<like predicate> ::=
    <match value> [ NOT ] LIKE <pattern>
    [ ESCAPE <escape character> ]

I have now learned, that MySQL (and of course MariaDB) unlike any other database also support quoting with backslashes, similar to Java and other languages. That’s not a problem per se, although from a cross-vendor compatibility perspective, it’s quite nasty. But then, I’ve discovered there is a flag called NO_BACKSLASH_ESCAPES:

This just reminds me of PHP’s horrible magic quotes. In fact, combine arbitrary configurations of PHP and MySQL on your server and good luck to you of ever getting string literals right. Sigh.

SAP’s Hilarious SQL Whitepaper(s)

While looking for some authoritative information about Sybase SQL Anywhere 12’s TOP .. START AT clause, I stumbled upon this hilarious white paper here, which I do not want to keep from you:
http://www.sybase.com/files/White_Papers/Sybase_Top_10_Features_In_SQL_Anywhere_12.pdf

I will take advantage of “fair use policy” and cite parts from section 7:

Feature number 7: improved support for DaffySQL syntax

If I told you that RowGenerator.row_num contains the values 1 through 255, what would you say this query returned?

[Query example]

Give up? OK, how about this one?

[Query example]

Still stumped? If I told you they both returned exactly the same result set as the following query, what would you say?

[Query example]

Yes, the LIMIT clause is new to SQL Anywhere 12, exactly the same as TOP START AT except it uses zero as the starting point for numbering rows instead of 1.

An “offset”, get it?

As in “Here’s ten dollars, let me count it for you: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9.”

Why implement LIMIT? And why include it in a list of cool features?

Because there are a lot of MySQL users out there who don’t have TOP START AT, and they’ve written zillions of queries using LIMIT, and they’d like to migrate their apps to SQL Anywhere without rewriting everything. And PostgreSQL users too… welcome aboard!

Migrating to SQL Anywhere is definitely cool.

So be cool and migrate to SQL Anywhere already! :-) I’m now going through the rest of this fun document.