The Awesome PostgreSQL 9.4 / SQL:2003 FILTER Clause for Aggregate Functions

Sometimes when aggregating data with SQL, we’d love to add some additional filters. For instance, consider the following world bank data:

GDP per capita (current US$)

          2009    2010    2011    2012
CA      40,764  47,465  51,791  52,409	
DE      40,270  40,408  44,355  42,598	
FR      40,488  39,448  42,578  39,759	
GB      35,455  36,573  38,927  38,649	
IT      35,724  34,673  36,988  33,814	
JP      39,473  43,118  46,204  46,548	
RU       8,616  10,710  13,324  14,091	
US      46,999  48,358  49,855  51,755	

And the table structure:

CREATE TABLE countries (
  code CHAR(2) NOT NULL,
  year INT NOT NULL,
  gdp_per_capita DECIMAL(10, 2) NOT NULL
);

Now, let’s assume we’d like to find the number of countries with a GDP higher than 40,000 for each year.

With standard SQL:2003, and now also with the newly released PostgreSQL 9.4, we can now take advantage of the new FILTER clause, which allows us to write the following query:

SELECT
  year,
  count(*) FILTER (WHERE gdp_per_capita >= 40000)
FROM
  countries
GROUP BY
  year

The above query will now yield:

year   count
------------
2012   4
2011   5
2010   4
2009   4

And that’s not it! As always, you can use any aggregate function also as a window function simply by adding an OVER() clause to the end:

SELECT
  year,
  code,
  gdp_per_capita,
  count(*) 
    FILTER (WHERE gdp_per_capita >= 40000) 
    OVER   (PARTITION BY year)
FROM
  countries

The result would then look something like this:

year   code   gdp_per_capita   count
------------------------------------
2009   CA           40764.00       4
2009   DE           40270.00       4
2009   FR           40488.00       4
2009   GB           35455.00       4

jOOQ 3.6 will also support the new FILTER clause for aggregate functions

Good news for jOOQ users. You can write the same query with jOOQ intuitively as such:

DSL.using(configuration)
   .select(
        COUNTRIES.YEAR,
        count().filterWhere(
            COUNTRIES.GDP_PER_CAPITA.ge(40000)
        ))
   .from(COUNTRIES)
   .groupBy(COUNTRIES.YEAR)
   .fetch();

… and

DSL.using(configuration)
   .select(
        COUNTRIES.YEAR,
        COUNTRIES.CODE,
        COUNTRIES.GDP_PER_CAPITA
        count().filterWhere(
                   COUNTRIES.GDP_PER_CAPITA.ge(40000))
               .over(partitionBy(COUNTRIES.YEAR)))
   .from(COUNTRIES)
   .fetch();

And the best thing is that jOOQ (as usual) emulates the above clause for you if you’re not using PostgreSQL. The equivalent query would be:

SELECT
  year,
  count(CASE WHEN gdp_per_capita >= 40000 THEN 1 END)
FROM
  countries
GROUP BY
  year

jOOQ: The best way to write SQL in Java

Read more about what’s new in PostgreSQL 9.4 here

Publicly available SQL standard documents

For your reference, here is a list of publicly available SQL standard documents. Some are very late draft versions, where the final version is closed source and has to be purchased from ISO or ANSI directly:

The links were taken from here:

http://wiki.postgresql.org/wiki/Developer_FAQ

Subtle SQL differences: IDENTITY columns

As I’m mostly using Oracle, IDENTITY columns were not so important to me up until I started to support them in jOOQ. Then, I found that yet again, there are many differences between various databases in how they handle IDENTITY columns in DDL and in DML.

In SQL, there are essentially three orthogonal concepts of how to identify a record (Please correct me if I may be missing more vendor-specific concepts):

  1. Primary Key: This is the most well-known concept. A primary key is the “main unique key” of a table, which means it is a constraint imposed on at least one column. It can be imposed on several columns, too. The primary key has its origin in the relational model
  2. Identity: Identities / identity columns are attributed to at most one column per table. They generate a unique ID from a sequence at record insertion – unlike primary keys, which are expected to be inserted correctly by client DML. The identity often coincides with the primary key, but it does not have to. It does not always have to be unique, either
  3. ROW ID / OID: Many databases know an internal ID that is usually not used in the logical representation of the database schema. This row id has mostly technical purposes and is also present in tables without primary key or identity columns

SQL 2008 standard

While primary key constraints are defined pretty much in the same way in every SQL dialect, and ROWID’s are very vendor-specific things, identities are something relatively new. The SQL 1992 standard does not yet define identity columns. They were formally introduced in SQL:2003, only. Here’s a part of the specification:

<column definition> ::=
  <column name> [ <data type or domain name> ]
  [ <default clause> | <identity column specification> ]
  [ ... ]

<identity column specification> ::=
  GENERATED { ALWAYS | BY DEFAULT } AS IDENTITY
  [ <left paren> <common sequence generator options> <right paren> ]

 

This looks neat. In fact, in addition to column constraints, columns can specify an identity generation expression based on a sequence. What does reality look like?

Reality

DB2, Derby, HSQLDB, Ingres

These SQL dialects implement the standard very neatly.


id INTEGER GENERATED BY DEFAULT AS IDENTITY
id INTEGER GENERATED BY DEFAULT AS IDENTITY (START WITH 1)

H2, MySQL, Postgres, SQL Server, Sybase ASE, Sybase SQL Anywhere

These SQL dialects implement identites, but the DDL syntax doesn’t follow the standard


-- H2 mimicks MySQL's and SQL Server's syntax
ID INTEGER IDENTITY(1,1)
ID INTEGER AUTO_INCREMENT
-- MySQL
ID INTEGER NOT NULL AUTO_INCREMENT

-- Postgres serials implicitly create a sequence
-- Postgres also allows for selecting from custom sequences
-- That way, sequences can be shared among tables
id SERIAL NOT NULL

-- SQL Server
ID INTEGER IDENTITY(1,1) NOT NULL
-- Sybase ASE
id INTEGER IDENTITY NOT NULL
-- Sybase SQL Anywhere
id INTEGER NOT NULL IDENTITY

Oracle

Oracle does not know any identity columns at all. Instead, you will have to use a trigger and update the ID column yourself, using a custom sequence. Something along these lines:

CREATE OR REPLACE TRIGGER my_trigger
BEFORE INSERT
ON my_table
REFERENCING NEW AS new
FOR EACH ROW
BEGIN
  SELECT my_sequence.nextval
  INTO :new.id
  FROM dual;
END my_trigger;

Note, that this approach can be employed in most databases supporting sequences and triggers! It is a lot more flexible than standard identities

SQLite

SQLite does not know any identity columns either. It seems to have MySQL’s AUTO_INCREMENT clause, but in fact this just sets some rules on the ROWID generation, which is not a true identity column according to the SQL standard

Conclusion

As many things in SQL, identities are a major pain to get correctly when switching databases, or when writing SQL DDL and DML that works correctly across databases.

jOOQ supports DML executed against tables with identity columns and/or trigger-generated ID values. See a previous blog post that further elaborates on the issue:

https://lukaseder.wordpress.com/2011/08/29/postgres-insert-returning-clause-and-how-this-can-be-simulated-in-other-rdbms/