Site icon Java, SQL and jOOQ.

SQL GROUP BY and Functional Dependencies: A Very Useful Feature

Relational databases define the term “Functional Dependency” as such (from Wikipedia):
In relational database theory, a functional dependency is a constraint between two sets of attributes in a relation from a database. In other words, functional dependency is a constraint that describes the relationship between attributes in a relation.
In SQL, functional dependencies appear whenever there is a unique constraint (e.g. a primary key constraint). Let’s assume the following:

CREATE TABLE actor (
  actor_id BIGINT NOT NULL PRIMARY KEY,
  first_name VARCHAR(50) NOT NULL,
  last_name VARCHAR(50) NOT NULL
);

It can be said that both FIRST_NAME and LAST_NAME each have a functional dependency on the ACTOR_ID column.

Nice. So what?

This isn’t just some mathematical statement that can be applied to unique constraints. It’s extremely useful for SQL. It means that for every ACTOR_ID value, there can be only one (functionally dependent) FIRST_NAME and LAST_NAME value. The other way round, this isn’t true. For any given FIRST_NAME and/or LAST_NAME value, we can have multiple ACTOR_ID values, as we can have multiple actors by the same names. Because there can be only one corresponding FIRST_NAME and LAST_NAME value for any given ACTOR_ID value, we can omit those columns in the GROUP BY clause. Let’s assume also:

CREATE TABLE film_actor (
  actor_id BIGINT NOT NULL,
  film_id BIGINT NOT NULL,
  
  PRIMARY KEY (actor_id, film_id),
  FOREIGN KEY (actor_id) REFERENCS actor (actor_id),
  FOREIGN KEY (film_id) REFERENCS film (film_id)
);

Now, if we want to count the number of films per actor, we can write:

SELECT
  actor_id, first_name, last_name, COUNT(*)
FROM actor
JOIN film_actor USING (actor_id)
GROUP BY actor_id
ORDER BY COUNT(*) DESC

This is extremely useful as it saves us from a lot of typing. In fact, the way GROUP BY semantics is defined, we can put all sorts of column references in the SELECT clause, which are any of:

Unfortunately, not everyone supports this

If you’re using Oracle, for instance, you can’t make use of the above. You’ll need to write the classic, equivalent version where all the non-aggregate column expressions appearing in the SELECT clause must also appear in the GROUP BY clause

SELECT
  actor_id, first_name, last_name, COUNT(*)
FROM actor
JOIN film_actor USING (actor_id)
GROUP BY actor_id, first_name, last_name
--                 ^^^^^^^^^^  ^^^^^^^^^ unnecessary
ORDER BY COUNT(*) DESC

Further reading:
Exit mobile version