10 SQL Articles Everyone Must Read

We’ve been blogging about Java and SQL for a while now, on the jOOQ blog. Over the years, while researching interesting blog topics, we’ve discovered a lot of SQL gems in the blogosphere that have inspired our work and our passion for SQL.

Today, we’re presenting to you a list of 10 articles that we think you should absolutely read. At the end of the list, you will agree that either:

  • SQL is awesome
  • SQL is crazy

… or probably both. Here goes, in no particular order:

1. Joe Celko: “Divided We Stand: The SQL of Relational Division”

Relational division is a very powerful concept in relational algebra. It answers questions like:

Give me all the students that have completed a given set of courses

Unfortunately, division doesn’t have any direct equivalent in SQL. We wish there would be a clause like

TABLE_A
  DIVIDE BY TABLE_B 
  ON [ some predicate ]

Nonetheless, you can express a division in SQL in various ways.

Read Joe’s: “Divided We Stand: The SQL of Relational Division

2. Alex Bolenok: “Happy New Year!”

Alek Bolenok (a.k.a. Quassnoi) blogs about various interesting SQL-related things, but one of his top contributions every year are his “happy new year” series. Alek paints “beautiful” (beauty is in the eye of the beholder), and certainly impressive pictures into your SQL console. For instance:

Read Alek’s, “Christmas tree in SQL

3. Markus Winand: “Clustering Data: The Second Power of Indexing”

Markus Winand is the author of the popular book SQL Performance Explained, parts of which you can also read on his blog “Use The Index Luke“. There is an incredible amount of very useful knowledge both in the book and on this page, but one of the most revealing and neat SQL tricks is to know about “covering indexes”, “clustering indexes”, or “index only scans”

Read Markus’s: “Clustering Data: The Second Power of Indexing

4. Dimitri Fontaine: “Understanding Window Functions”

There was SQL before window functions and SQL after window functions

Window functions are some of the most powerful and underused features of SQL. They’re available in all commercial databases, in PostgreSQL, and soon also in Firebird 3.0. We’ve blogged about window functions a couple of times ourselves, but one of the best summaries and explanations about what they really are and how they work has been written by Dimitri Fontaine.

Read Dimitri’s: “Understanding Window Functions

5. Lukas Eder: “10 Common Mistakes Java Developers Make when Writing SQL”

A bit of advertising for our own writing. We’ve collected 10 of the most common mistakes that Java developers make when writing SQL. These mistakes are actually not even specific to Java developers, they could happen to any developer. This article has generated so much traction on our blog, there must be some great truth in it.

Read Lukas’s: “10 Common Mistakes Java Developers Make when Writing SQL

6. András Gábor’s “Techniques for Pagination in SQL”

Up until recently, offset pagination has been rather difficult to implement in commercial databases like Oracle, SQL Server, DB2, Sybase, which didn’t feature the equivalent of MySQL’s / PostgreSQL’s LIMIT .. OFFSET clause. Pagination could be emulated, however, and there are a lot of techniques for doing that. Picking the right technique is essential for performance. If you’re using Oracle 11g or less, you should filter on ROWNUM:

Read András’s “Techniques for Pagination in SQL

On a side-note, you should probably consider not using OFFSET at all. For details about the NO OFFSET movement, read…

7. Markus Windand: “We need tool support for keyset pagination”

If you think about OFFSET pagination, it’s actually a rather dumb thing from a technical perspective, and a useless thing from a business perspective. Here’s why.

From a technical perspective…

… you need to apply filtering, grouping, and ordering on a vast amount of data, skipping and throwing away all the data that appears before the offset until you reach the first row of interest. That is a lot of waste of resources given that…

From a business perspective…

… perhaps, pages 1-3 are interesting, but there is absolutely no meaning in offering users to navigate to page 1337. After a certain offset, the meaning of the offset from a business perspective has vanished. You might as well display random, unordered data samples. The user wouldn’t notice. Proably, when you reach a higher page on Google search results, this is exactly what happens. Random stuff.

Or on reddit. There, you get random stuff already on the first page – such as this popular display of a Stabilized head on green vine snake.

Much better than offset pagination is keyset pagination (which we’ve blogged about as well).

Read Markus’s “We need tool support for keyset pagination

no-offset-banner-468x60.white

8. Josh Berkus “Tag All The Things”

Implementing tagging in a relational database can be a beast from a performance perspective. Should you normalise (one-to-many)? Should you normalise heavily (many-to-many)? Should you use nested collections / arrays / or even JSON data structures?

Josh has written a very interesting write-up on the performance of heavy tagging in PostgreSQL, showing that normalisation isn’t always the best choice.

Read Josh’s “Tag All The Things

9. Alek Bolenok’s “10 things in SQL Server (which don’t work as expected)”

This is again Alek’s (Quassnoi’s) work. A very interesting set of things that happen inside of SQL Server, which you might not have expected when you’re used to using other databases. Whether you’re using SQL Server or not, this is a must-read to re-raise awareness of the subtle little differences between SQL implementations

Read Alek’s “10 things in SQL Server (which don’t work as expected)

10. Aaron Bertrand: “Best approaches for running totals”

Running totals are a very typical use-case for SQL-based reporting. A running total is something that every project manager using Excel knows intuitively how to do. Just drag-and-drop that sweet sweet formula across your spreadsheet and done:

excel-running-total

How to do the same in SQL? There are again tons of ways. Aaron Bertrand has summarised various solutions for SQL Server 2012.

Read Aaron’s “Best approaches for running totals

Many other articles

There are, of course, many other very good articles providing deep insight into useful SQL tricks. If you find you’ve encountered an article that would nicely complement this list, please leave a link and description in the comments section. Future readers will appreciate the additional insight.

Advanced SQL: Relational division in jOOQ

Relational algebra has its treats. One of the most academic features is the relational division. It is hardly ever used, but comes in handy every now and then. And when you need it, you’ll probably hate yourself for having slept during the relevant classes at the university.

What is relational division?

Relational division is the inverse of a cross join operation. The following is an approximate definition of a relational division:

Assume the following cross join / cartesian product 
C = A × B 

Then it can be said that 
A = C ÷ B 
B = C ÷ A

What does it mean, typically?

Let’s have a look at the sample provided on Wikipedia:

Wikipedia example of a relational division

Wikipedia example of a relational division

This looks sensible. The division of Completed ÷ DBProject leads to a list of students that have completed all projects.

Now how to phrase that in SQL??

That’s not so simple as it looks. The most commonly documented solution involves a doubly-nested select statement using anti-joins. In human language (using double negative), it is like Fred and Sarah saying “there is no DBProject that we have not Completed“. Or in SQL:

SELECT DISTINCT "c1".Student FROM Completed "c1"
WHERE NOT EXISTS (
  SELECT 1 FROM DBProject
  WHERE NOT EXISTS (
    SELECT 1 FROM Completed "c2"
    WHERE "c2".Student = "c1".Student
    AND "c2".Task = DBProject.Task
  )
)

Now, no one sane wants to remember this just for the 1-2 times in a SQL developer’s life that they actually need it. So they use jOOQ, which wraps up the above monster in a concise syntax:

create.select().from(
  Completed.divideBy(DBProject)
           .on(Completed.Task.equal(DBProject.Task))
           .returning(Completed.Student)
);

Note that from the above SQL statement, it is immediately clear that proper indexing is of the essence. Be sure to have indexes on all columns referenced from the on(…) and returning(…) clauses.

More information

For more information about relational division and some nice, real-life examples, see