In relational database theory, a functional dependency is a constraint between two sets of attributes in a relation from a database. In other words, functional dependency is a constraint that describes the relationship between attributes in a relation.In SQL, functional dependencies appear whenever there is a unique constraint (e.g. a primary key constraint). Let’s assume the following:
It can be said that both
CREATE TABLE actor ( actor_id BIGINT NOT NULL PRIMARY KEY, first_name VARCHAR(50) NOT NULL, last_name VARCHAR(50) NOT NULL );
LAST_NAMEeach have a functional dependency on the
Nice. So what?This isn’t just some mathematical statement that can be applied to unique constraints. It’s extremely useful for SQL. It means that for every
ACTOR_IDvalue, there can be only one (functionally dependent)
LAST_NAMEvalue. The other way round, this isn’t true. For any given
LAST_NAMEvalue, we can have multiple
ACTOR_IDvalues, as we can have multiple actors by the same names. Because there can be only one corresponding
LAST_NAMEvalue for any given
ACTOR_IDvalue, we can omit those columns in the
GROUP BYclause. Let’s assume also:
Now, if we want to count the number of films per actor, we can write:
CREATE TABLE film_actor ( actor_id BIGINT NOT NULL, film_id BIGINT NOT NULL, PRIMARY KEY (actor_id, film_id), FOREIGN KEY (actor_id) REFERENCS actor (actor_id), FOREIGN KEY (film_id) REFERENCS film (film_id) );
This is extremely useful as it saves us from a lot of typing. In fact, the way
SELECT actor_id, first_name, last_name, COUNT(*) FROM actor JOIN film_actor USING (actor_id) GROUP BY actor_id ORDER BY COUNT(*) DESC
GROUP BYsemantics is defined, we can put all sorts of column references in the
SELECTclause, which are any of:
- Column expressions that appear in the
- Column expressions that are functionally dependent on the set of column expressions in the
- Aggregate functions
Unfortunately, not everyone supports thisIf you’re using Oracle, for instance, you can’t make use of the above. You’ll need to write the classic, equivalent version where all the non-aggregate column expressions appearing in the
SELECTclause must also appear in the
SELECT actor_id, first_name, last_name, COUNT(*) FROM actor JOIN film_actor USING (actor_id) GROUP BY actor_id, first_name, last_name -- ^^^^^^^^^^ ^^^^^^^^^ unnecessary ORDER BY COUNT(*) DESC
6 thoughts on “SQL GROUP BY and Functional Dependencies: A Very Useful Feature”
Really helped me! Until today I used to think of “functional dependency” a term that was best left to college textbooks. But as a developer trying to improve his skills, I appreciate this concept now, thanks to your explanation.
Glad it helped. It’s a really useful feature, but also a dangerous one. If the foreign key constraint is deferred, or even disabled (e.g. for a migration), then the query might break.
Hey, thanks for the reply! Do you mind explaining why this might break the query? Is it because the absence of foreign key constraint will make MySQL fail at determining functional dependencies and it will refuse to run the queries? Or something else (like loss of data integrity) is also possible? An example will be really great. :-)
Classic MySQL (without strict mode) simply doesn’t care about GROUP BY correctness. If there’s no guarantee about functional dependencies, you’ll simply get random values from the group if you project (e.g. SELECT) illegal expressions.
I was referring to PostgreSQL, which allows using functional dependencies in GROUP BY only in the presence of a primary key (not foreign key, my bad). Once the key is turned off, the parser will reject the query. Check out this SQLFiddle:
Makes sense. I’m on the newer, strict-version MySQL, so don’t need to worry a lot, I guess. Is it a common practice to disable constraints during migrations? I’ve never migrated databases but it doesn’t seem necessary to me. Why not simply populate the tables in the order they were created and all will be fine?
It’s not a common practice, but I adhere to Murphy’s Law. Ever populated a graph?