The SQL standard knows an interesting feature where you can project any functional dependencies of a primary (or unique) key that is listed in the
GROUP BY clause without having to add that functional dependency to the
GROUP BY clause explicitly.
What does this mean? Consider this simple schema:
CREATE TABLE author ( id INT NOT NULL PRIMARY KEY, name TEXT NOT NULL ); CREATE TABLE book ( id INT NOT NULL PRIMARY KEY, author_id INT NOT NULL REFERENCES author, title TEXT NOT NULL );
In order to count the number of books by author, we tend to write:
SELECT a.name, count(b.id) FROM author a LEFT JOIN book b ON a.id = b.author_id GROUP BY a.id, -- Required, because names aren't unique a.name -- Required in some dialects, but not in others
We have to group by something unique in this case, because if two authors are called John Doe, we still want them to produce separate groups. So
GROUP BY a.id is a given.
We’re used to also
GROUP BY a.name, especially in these dialects that require this, since we list
a.name in the
- SQL Server
But is it really required? It isn’t as per the SQL standard, because there is a functional dependency between
author.name. In other words, for each value of
author.id, there is exactly one possible value of
author.name is a function of
This means that it does not matter if we
GROUP BY both columns, or only the primary key. The result must be the same in both cases, hence this is possible:
SELECT a.name, count(b.id) FROM author a LEFT JOIN book b ON a.id = b.author_id GROUP BY a.id
Which SQL dialects support this?
At least the following SQL dialects support this language feature:
It’s noteworthy that MySQL used to simply ignore whether a column could be projected unambiguously or not, in the presence of
GROUP BY. While the following query was rejected in most dialects, it was not, in MySQL, prior to the introduction of the ONLY_FULL_GROUP_BY mode:
SELECT author_id, title, count(*) FROM author GROUP BY author_id
What should we display for
author.title, if an author has written more than one book? It doesn’t make sense, yet MySQL still used to allow it, and would just project any arbitrary value from the group.
Today, MySQL only allows for projecting columns with a functional dependency on the
GROUP BY clause, as is permitted by the SQL standard.
Pros & Cons
While the shorter syntax that avoids the extra columns might be easier to maintain (easy to project additional columns, if required), there is some risk of queries breaking in production, namely when underlying constraints are disabled, e.g. for a migration. While it is unlikely that a primary key is disabled in a live system, it could still be the case, and without the key, a previously valid query will no longer be valid for the same reason why MySQL’s old interpretation was invalid: There’s no longer a guarantee of functional dependency.
Starting from jOOQ 3.16, and #11834, it will be possible to reference tables directly in the
GROUP BY clause, instead of individual columns. For example:
SELECT a.name, count(b.id) FROM author a LEFT JOIN book b ON a.id = b.author_id GROUP BY a
The semantics will be:
- If the table has a primary key (composite or not), use that in the
GROUP BYclause, instead
- If the table doesn’t have a primary key, list all the columns from the table instead.
Since none of the RDBMS supported by jOOQ currently supports this syntax, it is a purely synthetic jOOQ feature.