Quantified Comparison Predicates – Some of SQL’s Rarest Species

A recent Tweet by Aaron Bertrand (whom you’ve certainly encountered on Stack Overflow) has triggered my interest

Indeed, few people I’ve met and who’ve visited my SQL masterclass have heard about the ANY and ALL quantifiers, which you can use in SQL, let alone used them on a regular basis (or ever used them at all).

What are these things?

The formal definition is rather simple. Let

R <comp op> <quantifier> S

… be a quantified comparison predicate where

  • R is a row
  • <comp op> is a comparison operator (like =, !=, etc.)
  • <quantifier> is ANY or ALL
  • S is a subquery

You can now write things like:

SELECT *
FROM author AS a
WHERE a.author_id = ANY (
  SELECT b.author_id
  FROM book AS b
)

The above reads relatively intuitively (depending on your understanding of “intuitive”):

I want all authors whose author_id is equal to “ANY” of the author_id values contained in book

In fact, the SQL standard defines the IN predicate as being just syntax sugar for the = ANY() quantified comparison predicate.

8.4 <in predicate>

Let RVC be the <row value predicand> and 
let IPV be the <in predicate value>.

...

The expression

  RVC IN IPV

is equivalent to

  RVC = ANY IPV

OK, so why even use these weird operators?

The interesting case for these operators is not where the comparison operator is an equality (“=”) or non-equality (“!=” or “”). The interesting case is where we’re comparing things with less-than or greater-than operators, such as:

SELECT *
FROM author AS a
WHERE a.age > ALL (
  SELECT c.age
  FROM customer AS c
)

Or, in plain English:

I want all authors whose age is greater than the age of all customers

That’s neat, isn’t it? Of course, there are (at least) two alternative ways of tackling the same query, namely:

Comparing only with the MAX()

SELECT *
FROM author AS a
WHERE a.age > (
  SELECT MAX(c.age)
  FROM customer AS c
)

This solution compares the author’s age only with the maximum customer age, which obviously yields the same result. Once we’ve found an author older than the oldest customer, then that author is older than ALL the customers.

Instead of using MAX(), we can use the following, semantically equivalent query:

Comparing only with the “first”

SELECT *
FROM author AS a
WHERE a.age > (
  SELECT c.age
  FROM customer AS c
  ORDER BY c.age DESC
  LIMIT 1
)

I’m using PostgreSQL syntax (LIMIT 1), although most databases have some way of returning only one row from a query (see the jOOQ manual for details). This is the same as using MAX(). Some databases recognise the fact that we’re only returning a single row, and they won’t sort the entire result set first in O(N log N), but yield the maximum value only in O(N), and only once for all authors, instead of once per author. Whether this optimisation is done in your database, I can’t say. You should measure this yourself, and check out execution plans.

So, again, why use quantifiers?

Aaron from the twitter conversation would advise against using this “unintuitive” syntax:

Perhaps. Ultimately, you should always choose performance first, and then – most certainly – intuitiveness second (because some poor soul might need to maintain your query). But personally, I find these quantifiers quite elegant for three reasons:

  1. They express the quantification right where it belongs. With the comparison operator. Compare this with the solution using LIMIT, which may be far away, visually, from the greater-than operator. Quantifiers are much more concise, even than when using MAX() (in my opinion)
  2. They’re very set oriented. I like thinking in terms of sets when I work with SQL. Whenever I can omit the ORDER BY clause, I will. If only to avoid potentially slow operations (in case the database doesn’t optimise this, and a full O(N log N) sort operation is invoked)
  3. Quantified comparison predicates work on rows too, not just on single values.

Check this out:

SELECT (c1.last_name, c1.first_name) >= ALL (
  SELECT c2.last_name, c2.first_name
  FROM customer AS c2
)
FROM customer AS c1
WHERE c1.id = 1

The above query will yield a single column containing a “TRUE” or “FALSE” value (e.g. in PostgreSQL, which supports this exact syntax). The idea is that we’re running a query for customer with id = 1 and we want to see if that customer’s (last_name, first_name) tuple is “after” all the other customers’. Or in plain English:

Am I at the end of the phone book?

Again, you could do this with LIMIT:

SELECT (c1.last_name, c1.first_name) >= (
  SELECT c2.last_name, c2.first_name
  FROM customer AS c2
  ORDER BY c2.last_name DESC, c2.first_name DESC
  LIMIT 1
)
FROM customer AS c1
WHERE c1.id = 1

… but I find this much less elegant.

Unfortunately, this time there’s no way to solve this problem with MAX(). No database (that I’m aware of) supports using row value expressions (tuples) for aggregate functions like MAX(). This would be cool:

SELECT (c1.last_name, c1.first_name) >= (
  SELECT MAX((c2.last_name, c2.first_name))
  FROM customer AS c2
)
FROM customer AS c1
WHERE c1.id = 1

Or this:

SELECT (c1.last_name, c1.first_name) >= (
  SELECT MAX(ROW(c2.last_name, c2.first_name))
  FROM customer AS c2
)
FROM customer AS c1
WHERE c1.id = 1

Of course, the above is just a constructed example. Yes, you could also use window functions instead (e.g. LEAD()).

Conclusion

Quantified comparison predicates are very rarely seen in the wild. One reason is, few people know about them, or think of them when they’re solving a problem. Another reason might be, they’re not well optimised in some databases, so it’s actually a bad idea to use them.

Nonetheless, I find them to be very interesting SQL trivia to know about. And who knows, perhaps one day, you do run into a problem that is indeed best solved with quantified comparison predicates. Then you’ll shine with the most optimal solution.

Further reading

A Wonderful SQL Feature: Quantified Comparison Predicates (ANY, ALL)

Have you ever wondered about the use-case behind SQL’s ANY (also: SOME) and ALL keywords?

You have probably not yet encountered these keywords in the wild. Yet they can be extremely useful. But first, let’s see how they’re defined in the SQL standard. The easy part:

8.7  <quantified comparison predicate>

Function

    Specify a quantified comparison.

Format

    <quantified comparison predicate> ::=
        <row value constructor> <comp op> 
            <quantifier> <table subquery>

    <quantifier> ::= <all> | <some>
    <all> ::= ALL
    <some> ::= SOME | ANY

Intuitively, such a quantified comparison predicate can be used as such:

-- Is any person of age 42?
42 = ANY (SELECT age FROM person)

-- Are all persons younger than 42?
42 > ALL (SELECT age FROM person)

Let’s keep with the useful ones. Observe that you have probably written the above queries with a different syntax, as such:

-- Is any person of age 42?
42 IN (SELECT age FROM person)

-- Are all persons younger than 42?
42 > (SELECT MAX(age) FROM person)

In fact, you’ve used the <in predicate>, or a greater than predicate with a <scalar subquery> and an aggregate function.

The IN predicate

It’s not a coincidence that you might have used the <in predicate> just like the above <quantified comparison predicate> using ANY. In fact, the <in predicate> is specified just like that:

8.4 <in predicate>

Syntax Rules

2) Let RVC be the <row value constructor> and let IPV 
   be the <in predicate value>.

3) The expression

     RVC NOT IN IPV

   is equivalent to

     NOT ( RVC IN IPV )

4) The expression

     RVC IN IPV

   is equivalent to

     RVC = ANY IPV

Precisely! Isn’t SQL beautiful? Note, the implicit consequences of 3) lead to a very peculiar behaviour of the NOT IN predicate with respect to NULL, which few developers are aware of.

Now, it’s getting awesome

So far, there is nothing out of the ordinary with these <quantified comparison predicate>. All of the previous examples can be emulated with “more idiomatic”, or let’s say, “more everyday” SQL.

But the true awesomeness of <quantified comparison predicate> appears only when used in combination with <row value expression> where rows have a degree / arity of more than one:

-- Is any person called "John" of age 42?
(42, 'John') = ANY (SELECT age, first_name FROM person)

-- Are all persons younger than 55?
-- Or if they're 55, do they all earn less than 150'000.00?
(55, 150000.00) > ALL (SELECT age, wage FROM person)

See the above queries in action on PostgreSQL in this SQLFiddle.

At this point, it is worth mentioning that few databases actually support…

  • row value expressions, or…
  • quantified comparison predicates with row value expressions

Even if specified in SQL-92, it looks as most databases still take their time to implement this feature 22 years later.

Emulating these predicates with jOOQ

But luckily, there is jOOQ to emulate these features for you. Even if you’re not using jOOQ in your project, the following SQL transformation steps can be useful if you want to express the above predicates. Let’s have a look at how this could be done in MySQL:

-- This predicate
(42, 'John') = ANY (SELECT age, first_name FROM person)

-- ... is the same as this:
EXISTS (
  SELECT 1 FROM person 
  WHERE age = 42 AND first_name = 'John'
)

What about the other predicate?

-- This predicate
(55, 150000.00) > ALL (SELECT age, wage FROM person)

-- ... is the same as these:
----------------------------
-- No quantified comparison predicate with
-- Row value expressions available
(55, 150000.00) > (
  SELECT age, wage FROM person
  ORDER BY 1 DESC, 2 DESC
  LIMIT 1
)

-- No row value expressions available at all
NOT EXISTS (
  SELECT 1 FROM person
  WHERE (55 < age)
  OR    (55 = age AND 150000.00 <= wage)
)

Clearly, the EXISTS predicate can be used in pretty much every database to emulate what we’ve seen before. If you just need this for a one-shot emulation, the above examples will be sufficient. If, however, you want to more formally use <row value expression> and <quantified comparison predicate>, you better get SQL transformation right.

Read on about SQL transformation in this article here.