The Useful BigQuery * EXCEPT Syntax

One of the coolest things about using and making jOOQ is that we get to discover the best extensions to the standard SQL language by vendors, and add support for those clauses in jOOQ via emulations.

One of these syntaxes is BigQuery’s * EXCEPT syntax. Everyone who ever wrote ad-hoc SQL queries would have liked this syntax to be available, occasionally. Why is it needed? Look at this query:

SELECT * FROM actor

The result being:

|actor_id|first_name|last_name   |last_update            |
|--------|----------|------------|-----------------------|
|1       |PENELOPE  |GUINESS     |2006-02-15 04:34:33.000|
|2       |NICK      |WAHLBERG    |2006-02-15 04:34:33.000|
|3       |ED        |CHASE       |2006-02-15 04:34:33.000|
|4       |JENNIFER  |DAVIS       |2006-02-15 04:34:33.000|
|5       |JOHNNY    |LOLLOBRIGIDA|2006-02-15 04:34:33.000|
|6       |BETTE     |NICHOLSON   |2006-02-15 04:34:33.000|
|...

But that LAST_UPDATE column is mighty annoying, especially if we want to NATURAL JOIN things. E.g. this doesn’t work:

SELECT actor_id, a.first_name, a.last_name, count(fa.film_id)
FROM actor AS a
NATURAL LEFT JOIN film_actor AS fa
GROUP BY actor_id

The result is just actors without films, because accidentally, the LAST_UPDATE column was included in the NATURAL JOIN:

|actor_id|first_name|last_name|count|
|--------|----------|---------|-----|
|58      |CHRISTIAN |AKROYD   |0    |
|8       |MATTHEW   |JOHANSSON|0    |
|116     |DAN       |STREEP   |0    |
|184     |HUMPHREY  |GARLAND  |0    |
|87      |SPENCER   |PECK     |0    |

This is the biggest flaw of NATURAL JOIN, making it almost useless for schemas that aren’t perfectly designed for NATURAL JOIN usage, but this is ad-hoc SQL, and it would have been nice to do that.

We could, if we had * EXCEPT like this:

SELECT 
  a.actor_id, 
  a.first_name, 
  a.last_name, 
  count(fa.film_id)
FROM (
  SELECT * EXCEPT (last_update) FROM actor
) AS a
NATURAL LEFT JOIN (
  SELECT * EXCEPT (last_update) FROM film_actor
) AS fa
GROUP BY 
  a.actor_id, 
  a.first_name, 
  a.last_name

Unfortunately, this doesn’t work in PostgreSQL and other dialects, but jOOQ can emulate it. If you provide the online SQL translator with the sakila database table definitions:

CREATE TABLE actor (
    actor_id integer NOT NULL,
    first_name character varying(45) NOT NULL,
    last_name character varying(45) NOT NULL,
    last_update timestamp without time zone DEFAULT now() NOT NULL
);
CREATE TABLE film_actor (
    actor_id smallint NOT NULL,
    film_id smallint NOT NULL,
    last_update timestamp without time zone DEFAULT now() NOT NULL
);

Then, it can resolve the EXCEPT syntax of the query and produce this:

SELECT
  a.actor_id,
  a.first_name,
  a.last_name,
  count(fa.film_id)
FROM (
  SELECT actor.actor_id, actor.first_name, actor.last_name
  FROM actor
) a
  NATURAL LEFT OUTER JOIN (
    SELECT film_actor.actor_id, film_actor.film_id
    FROM film_actor
  ) fa
GROUP BY
  a.actor_id,
  a.first_name,
  a.last_name

Obviously, we could have just omitted NATURAL JOIN to achieve the same result, but sometimes, it’s just nice to have yet another tool in the tool chain to write a nice query. With jOOQ, the query would read:

Actor a = ACTOR.as("a");
FilmActor fa = FILM_ACTOR.as("fa");

ctx.select(
        a.ACTOR_ID,
        a.FIRST_NAME,
        a.LAST_NAME,
        count(fa.FILM_ID))
   .from(
        select(asterisk().except(a.LAST_UPDATE)).from(a).asTable(a))
   .naturalLeftOuterJoin(
        select(asterisk().except(fa.LAST_UPDATE)).from(fa).asTable(fa))
   .groupBy(a.ACTOR_ID, a.FIRST_NAME, a.LAST_NAME)
   .fetch();

5 thoughts on “The Useful BigQuery * EXCEPT Syntax

  1. That’s pretty neat :) It is one of the features that would be nice to have defined in standard. Even though `SELECT *` is generally antipattern, it allows to achieve interesting results for solution generator(like handling schema drift). Normally we need to use dynamic SQL to have similar effect.

    It could be pushed even further by supporting “regexp column specification” like: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select

    For instance if we have a table with column set: cola, colb, colc and metadata columns sys_creation_timestamp, sys_update_timestamp, sys_key prefixed with ‘sys_'(and the RDBMS does not support invisible columns) it would be nice to have something like:

    SELECT * EXCEPT_LIKE(‘sys%’) FROM …

    1. At some point, the real question vendors should be asking themselves is, how can we evolve our language in a sustainable way? While * EXCEPT (and * REPLACE) seem like low hanging fruit for the most popular use-case, the regexp column specification certainly doesn’t cut it. At this point, a vendor should think about following Oracle’s SQL macros approach: https://oracle-base.com/articles/21c/sql-macros-21c, which is essentially what jOOQ does, but within the database, and template based instead of type safe, but it’s good enough.

  2. Could jooq improve on readability and have a except for the natural join clause itself?
    Something like

    ctx.select(
    .from(a)
    .naturalLeftOuterJoin(fa).except(a.LAST_UPDATE))
    .groupBy(a.ACTOR_ID, a.FIRST_NAME, a.LAST_NAME)
    .fetch();

Leave a Reply