Functional Dependencies in SQL GROUP BY

The SQL standard knows an interesting feature where you can project any functional dependencies of a primary (or unique) key that is listed in the GROUP BY clause without having to add that functional dependency to the GROUP BY clause explicitly.

What does this mean? Consider this simple schema:

CREATE TABLE author (
  id INT NOT NULL PRIMARY KEY,
  name TEXT NOT NULL
);

CREATE TABLE book (
  id INT NOT NULL PRIMARY KEY,
  author_id INT NOT NULL REFERENCES author,
  title TEXT NOT NULL
);

In order to count the number of books by author, we tend to write:

SELECT a.name, count(b.id)
FROM author a
LEFT JOIN book b ON a.id = b.author_id
GROUP BY 
  a.id,  -- Required, because names aren't unique
  a.name -- Required in some dialects, but not in others

We have to group by something unique in this case, because if two authors are called John Doe, we still want them to produce separate groups. So GROUP BY a.id is a given.

We’re used to also GROUP BY a.name, especially in these dialects that require this, since we list a.name in the SELECT clause:

  • Db2
  • Derby
  • Exasol
  • Firebird
  • HANA
  • Informix
  • Oracle
  • SQL Server

But is it really required? It isn’t as per the SQL standard, because there is a functional dependency between author.id and author.name. In other words, for each value of author.id, there is exactly one possible value of author.name, or author.name is a function of author.id

This means that it does not matter if we GROUP BY both columns, or only the primary key. The result must be the same in both cases, hence this is possible:

SELECT a.name, count(b.id)
FROM author a
LEFT JOIN book b ON a.id = b.author_id
GROUP BY a.id

Which SQL dialects support this?

At least the following SQL dialects support this language feature:

  • CockroachDB
  • H2
  • HSQLDB
  • MariaDB
  • MySQL
  • PostgreSQL
  • SQLite
  • Yugabyte

It’s noteworthy that MySQL used to simply ignore whether a column could be projected unambiguously or not, in the presence of GROUP BY. While the following query was rejected in most dialects, it was not, in MySQL, prior to the introduction of the ONLY_FULL_GROUP_BY mode:

SELECT author_id, title, count(*)
FROM author
GROUP BY author_id

What should we display for author.title, if an author has written more than one book? It doesn’t make sense, yet MySQL still used to allow it, and would just project any arbitrary value from the group.

Today, MySQL only allows for projecting columns with a functional dependency on the GROUP BY clause, as is permitted by the SQL standard.

Pros & Cons

While the shorter syntax that avoids the extra columns might be easier to maintain (easy to project additional columns, if required), there is some risk of queries breaking in production, namely when underlying constraints are disabled, e.g. for a migration. While it is unlikely that a primary key is disabled in a live system, it could still be the case, and without the key, a previously valid query will no longer be valid for the same reason why MySQL’s old interpretation was invalid: There’s no longer a guarantee of functional dependency.

Other syntax

Starting from jOOQ 3.16, and #11834, it will be possible to reference tables directly in the GROUP BY clause, instead of individual columns. For example:

SELECT a.name, count(b.id)
FROM author a
LEFT JOIN book b ON a.id = b.author_id
GROUP BY a

The semantics will be:

  • If the table has a primary key (composite or not), use that in the GROUP BY clause, instead
  • If the table doesn’t have a primary key, list all the columns from the table instead.

Since none of the RDBMS supported by jOOQ currently supports this syntax, it is a purely synthetic jOOQ feature.

Write C-Style Local Static Variables in Java 16

Java 16 includes an improvement that makes the language a bit more regular via JEP 395. The JEP says:

Static members of inner classes

It is currently specified to be a compile-time error if an inner class declares a member that is explicitly or implicitly static, unless the member is a constant variable. This means that, for example, an inner class cannot declare a record class member, since nested record classes are implicitly static.

We relax this restriction in order to allow an inner class to declare members that are either explicitly or implicitly static. In particular, this allows an inner class to declare a static member that is a record class.

What sounds like a minor necessary evil to make a new feature (record classes) more versatile actually has a life of its own. We can use it to emulate C-style local static variables, i.e. local variables that are:

  • Initialised only once (and lazily at that)
  • Shared among multiple executions of a method

This sounds like a rather hairy feature, global variables that are visible only locally. But in fact, it’s something I’ve wanted for a long time, especially when I wanted to cache regular expression patterns without polluting the class namespace.

Consider this code:

package p;

import java.util.regex.Pattern;

public class Test {
    public static void main(String[] args) {
        check("a");
        check("b");
    }
    
    static Pattern compile(String pattern) {
        System.out.println("compile(" + pattern + ")");
        return Pattern.compile(pattern);
    }
    
    static void check(String string) {
        // Re-compiling the pattern every time: Bad
        // Keeping the pattern local to the method: Good
        System.out.println("check(" + string + "): " 
            + compile("a").matcher(string).find());
    }
}

It prints:

compile(a)
check(a): true
compile(a)
check(b): false

Compiling a pattern can be costly if done frequently, so we better cache it. We used to do that like this:

package p;

import java.util.regex.Pattern;

public class Test {
    public static void main(String[] args) {
        check("a");
        check("b");
    }
    
    static Pattern compile(String pattern) {
        System.out.println("compile(" + pattern + ")");
        return Pattern.compile(pattern);
    }
    
    // Compiling the pattern only once: Good
    // Placing the pattern in a class namespace: Bad
	static final Pattern P_CHECK = compile("a");
	
    static void check(String string) {
        System.out.println("check(" + string + "): " 
            + P_CHECK.matcher(string).find());
    }
}

This now prints a more optimal output:

compile(a)
check(a): true
check(b): false

I.e. the regular expression pattern is compiled only once. But unfortunately, we had to pollute the entire class’s namespace, which can quickly become cumbersome if we have dozens of such regular expressions. Can we scope the P_CHECK variable to the check() method only? We now can!

package p;

import java.util.regex.Pattern;

public class Test {
    public static void main(String[] args) {
        check("a");
        check("b");
    }
    
    static Pattern compile(String pattern) {
        System.out.println("compile(" + pattern + ")");
        return Pattern.compile(pattern);
    }
    
    static void check(String string) {

        // Compiling the pattern only once: Good
        // Keeping the pattern local to the method: Good
        // Capturing scope: Egh...
        var patterns = new Object() { 
            static final Pattern P_CHECK = compile("a");
        };
        
        System.out.println("check(" + string + "): " 
            + patterns.P_CHECK.matcher(string).find());
    }
}

This again prints the desired, optimal output:

compile(a)
check(a): true
check(b): false

The combination of using var to use a non-denotable type (whose members we can dereference) along with the ability of putting static members in inner classes effectively emulates local static variables, just like in C!

Since the inner class is unlikely to escape its scope, the fact that it may be capturing scope isn’t that big of a risk as illustrated previously in a criticism of the double curly brace anti pattern. You’re still creating an extra class and a use-less object that escape analysis hopefully prevents from allocating, so it’s not exactly a very clean solution, but nice to know this is now possible.

The jOOQ Parser Ignore Comment Syntax

jOOQ’s parser can’t parse every possible SQL syntax. Try this random PostgreSQL syntax:

ALTER SYSTEM RESET ALL

And the jOOQ parser will complain:

DOMAIN, INDEX, SCHEMA, SEQUENCE, SESSION, TABLE, TYPE, or VIEW expected: [1:7] ALTER [*]SYSTEM RESET ALL

That’s perfectly fine. The goal of the jOOQ parser isn’t to understand all vendor specific syntax. The goal is to offer parser functionality for all of the syntax supported by jOOQ so far, and to be able to translate it between dialects. For a lot of syntax like the above, there isn’t an equivalent in other dialects anyway.

But if you’re using the jOOQ parser to simulate a database migration, e.g. to create a schema diff, or to generate jOOQ code using the DDLDatabase, then you may have some bits of vendor specific SQL in your scripts that you want jOOQ to ignore. For example, the above command may be in the middle of a migration script:

CREATE TABLE a (i int);
ALTER SYSTEM RESET ALL;
CREATE TABLE b (i int);

Luckily, with jOOQ, you can add special markers around the commands you wish to tell jOOQ to ignore, and jOOQ will ignore them. For this, just enable the Settings.parseIgnoreComments flag, and now you can use a special comment syntax:

CREATE TABLE a (i int);

/* [jooq ignore start] */
ALTER SYSTEM RESET ALL;
/* [jooq ignore stop] */

CREATE TABLE b (i int);

The syntax is transparent to your RDBMS, because they’re just comments. So, the RDBMS will see and execute this, just like before. So, your actual database migration isn’t affected:

CREATE TABLE a (i int);

/*                     */
ALTER SYSTEM RESET ALL;
/*                    */

CREATE TABLE b (i int);

But jOOQ, on the other hand, will interpret the comments a bit differently, and see and execute this, instead:

CREATE TABLE a (i int);

/*

                      */

CREATE TABLE b (i int);

Note, if you don’t like the [jooq ignore start] and [jooq ignore stop] tokens, you can customise them as well via Settings.parseIgnoreCommentStart and Settings.parseIgnoreCommentStop.

Like any of these purely text-based preprocessor syntaxes, this works completely transparently, even within some command syntax. Assuming you’re using some vendor-specific syntax in a CREATE TABLE‘s DEFAULT expression, you can just exclude that DEFAULT expression from jOOQ’s parsing:

CREATE TABLE t (
  a int 
    /* [jooq ignore start] */ 
    DEFAULT some_fancy_expression() 
    /* [jooq ignore stop] */
);

Obviously, you don’t have to format things this way, that’s just for illustration purposes on this blog. Now, again, the RDBMS will see and execute this:

CREATE TABLE t (
  a int 
    /*                     */ 
    DEFAULT some_fancy_expression() 
    /*                    */
);

Whereas jOOQ’s parser will see and execute this:

CREATE TABLE t (
  a int 
    /*

                          */
);

For more information about this topic, please refer to the jOOQ manual.

Use jOOλ’s Sneaky Throw to Avoid Checked Exceptions

Don’t you hate how you have to wrap checked exception throwing code in static initialisers? E.g. you cannot write this in Java:

public class Test {
    static final Class<?> klass = Class.forName("org.h2.Driver");
}

There’s an unhandled ClassNotFoundException, and you can’t catch / rethrow it simply. A static initialiser is needed:

public class Test {
    static final Class<?> klass;

    static {
        try {
            klass = Class.forName("org.h2.Driver");
        }
        catch (ClassNotFoundException e) {
            throw new RuntimeException(e);
        }
    }
}

Yuck.

Luckily, one of jOOλ’s lesser known features is the Sneaky class, which contains a bunch of utility methods that wrap a JDK functional interface to an equivalent, “sneaky-throwing” functional interface that doesn’t declare the checked exception.

In short, you can write:

public class Test {
    static final Class<?> klass = Sneaky.supplier(
        () -> Class.forName("org.h2.Driver")
    ).get();
}

The exception is simply re-thrown “sneakily”, as the JVM doesn’t care about an exception’s checked-ness. If you don’t have H2 on your classpath, you’ll get:

Exception in thread "main" java.lang.ExceptionInInitializerError
Caused by: java.lang.ClassNotFoundException: org.h2.Driver
	at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
	at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:520)
	at java.base/java.lang.Class.forName0(Native Method)
	at java.base/java.lang.Class.forName(Class.java:375)
	at org.jooq.test.util.Test.lambda$0(Test.java:44)
	at org.jooq.lambda.Unchecked.lambda$supplier$38(Unchecked.java:1695)
	at org.jooq.test.util.Test.<clinit>(Test.java:44)

You can use this approach wherever else a JDK functional interface is required, and you don’t care about an exception’s checked-ness, e.g. in streams:

// Doesn't compile:
Stream
    .generate(
        () -> Class.forName("org.h2.Driver"))
    .limit(1)
    .forEach(System.out::println);

// So ugly
Stream
    .generate(
        () -> {
            try {
                return Class.forName("org.h2.Driver");
            }
            catch (ClassNotFoundException e) {
                throw new RuntimeException(e);
            }
        })
    .limit(1)
    .forEach(System.out::println);

// Use jOOλ's Sneaky supplier
Stream
    .generate(Sneaky.supplier(
        () -> Class.forName("org.h2.Driver")))
    .limit(1)
    .forEach(System.out::println);

Get jOOλ here: https://github.com/jOOQ/jOOL