A recent Tweet by
Aaron Bertrand (
whom you’ve certainly encountered on Stack Overflow) has triggered my interest
Indeed, few people I’ve met and who’ve visited my
SQL masterclass have heard about the
ANY
and
ALL
quantifiers, which you can use in SQL, let alone used them on a regular basis (or ever used them at all).
What are these things?
The formal definition is rather simple. Let
R <comp op> <quantifier> S
… be a quantified comparison predicate where
R
is a row
<comp op>
is a comparison operator (like =, !=, etc.)
<quantifier>
is ANY
or ALL
S
is a subquery
You can now write things like:
SELECT *
FROM author AS a
WHERE a.author_id = ANY (
SELECT b.author_id
FROM book AS b
)
The above reads relatively intuitively (depending on your understanding of “intuitive”):
I want all authors
whose author_id
is equal to “ANY” of the author_id
values contained in book
In fact, the SQL standard defines the
IN
predicate as being just syntax sugar for the
= ANY()
quantified comparison predicate.
8.4 <in predicate>
Let RVC be the <row value predicand> and
let IPV be the <in predicate value>.
...
The expression
RVC IN IPV
is equivalent to
RVC = ANY IPV
OK, so why even use these weird operators?
The interesting case for these operators is not where the comparison operator is an equality (“=”) or non-equality (“!=” or “”). The interesting case is where we’re comparing things with less-than or greater-than operators, such as:
SELECT *
FROM author AS a
WHERE a.age > ALL (
SELECT c.age
FROM customer AS c
)
Or, in plain English:
I want all authors
whose age
is greater than the age
of all customers
That’s neat, isn’t it? Of course, there are (at least) two alternative ways of tackling the same query, namely:
Comparing only with the MAX()
SELECT *
FROM author AS a
WHERE a.age > (
SELECT MAX(c.age)
FROM customer AS c
)
This solution compares the author’s age only with the maximum customer age, which obviously yields the same result. Once we’ve found an author older than the oldest customer, then that author is older than
ALL the customers.
Instead of using
MAX()
, we can use the following, semantically equivalent query:
Comparing only with the “first”
SELECT *
FROM author AS a
WHERE a.age > (
SELECT c.age
FROM customer AS c
ORDER BY c.age DESC
LIMIT 1
)
I’m using PostgreSQL syntax (LIMIT 1), although most databases have some way of returning only one row from a query (
see the jOOQ manual for details). This is the same as using
MAX()
. Some databases recognise the fact that we’re only returning a single row, and they won’t sort the entire result set first in
O(N log N)
, but yield the maximum value only in
O(N)
, and only once for all authors, instead of once per author. Whether this optimisation is done in your database, I can’t say. You should measure this yourself, and check out execution plans.
So, again, why use quantifiers?
Aaron from the twitter conversation would advise against using this “unintuitive” syntax:
Perhaps. Ultimately, you should always choose performance first, and then – most certainly – intuitiveness second (because some poor soul might need to maintain your query). But personally, I find these quantifiers quite elegant for three reasons:
- They express the quantification right where it belongs. With the comparison operator. Compare this with the solution using LIMIT, which may be far away, visually, from the greater-than operator. Quantifiers are much more concise, even than when using MAX() (in my opinion)
- They’re very set oriented. I like thinking in terms of sets when I work with SQL. Whenever I can omit the
ORDER BY
clause, I will. If only to avoid potentially slow operations (in case the database doesn’t optimise this, and a full O(N log N)
sort operation is invoked)
- Quantified comparison predicates work on rows too, not just on single values.
Check this out:
SELECT (c1.last_name, c1.first_name) >= ALL (
SELECT c2.last_name, c2.first_name
FROM customer AS c2
)
FROM customer AS c1
WHERE c1.id = 1
The above query will yield a single column containing a “TRUE” or “FALSE” value (e.g. in PostgreSQL, which supports this exact syntax). The idea is that we’re running a query for
customer
with
id = 1
and we want to see if that customer’s
(last_name, first_name)
tuple is
“after” all the other customers’. Or in plain English:
Am I at the end of the phone book?
Again, you could do this with LIMIT:
SELECT (c1.last_name, c1.first_name) >= (
SELECT c2.last_name, c2.first_name
FROM customer AS c2
ORDER BY c2.last_name DESC, c2.first_name DESC
LIMIT 1
)
FROM customer AS c1
WHERE c1.id = 1
… but I find this much less elegant.
Unfortunately, this time there’s no way to solve this problem with
MAX()
. No database (that I’m aware of) supports using row value expressions (tuples) for aggregate functions like
MAX()
. This would be cool:
SELECT (c1.last_name, c1.first_name) >= (
SELECT MAX((c2.last_name, c2.first_name))
FROM customer AS c2
)
FROM customer AS c1
WHERE c1.id = 1
Or this:
SELECT (c1.last_name, c1.first_name) >= (
SELECT MAX(ROW(c2.last_name, c2.first_name))
FROM customer AS c2
)
FROM customer AS c1
WHERE c1.id = 1
Of course, the above is just a constructed example. Yes, you could also use
window functions instead (e.g. LEAD()).
Conclusion
Quantified comparison predicates are very rarely seen in the wild. One reason is, few people know about them, or think of them when they’re solving a problem. Another reason might be, they’re not well optimised in some databases, so it’s actually a bad idea to use them.
Nonetheless, I find them to be very interesting SQL trivia to know about. And who knows, perhaps one day, you do run into a problem that is indeed best solved with quantified comparison predicates. Then you’ll shine with the most optimal solution.
Further reading
Like this:
Like Loading...