The Awesome PostgreSQL 9.4 / SQL:2003 FILTER Clause for Aggregate Functions


Sometimes when aggregating data with SQL, we’d love to add some additional filters. For instance, consider the following world bank data:

GDP per capita (current US$)

          2009    2010    2011    2012
CA      40,764  47,465  51,791  52,409	
DE      40,270  40,408  44,355  42,598	
FR      40,488  39,448  42,578  39,759	
GB      35,455  36,573  38,927  38,649	
IT      35,724  34,673  36,988  33,814	
JP      39,473  43,118  46,204  46,548	
RU       8,616  10,710  13,324  14,091	
US      46,999  48,358  49,855  51,755	

And the table structure:

CREATE TABLE countries (
  code CHAR(2) NOT NULL,
  year INT NOT NULL,
  gdp_per_capita DECIMAL(10, 2) NOT NULL
);

Now, let’s assume we’d like to find the number of countries with a GDP higher than 40,000 for each year.

With standard SQL:2003, and now also with the newly released PostgreSQL 9.4, we can now take advantage of the new FILTER clause, which allows us to write the following query:

SELECT
  year,
  count(*) FILTER (WHERE gdp_per_capita >= 40000)
FROM
  countries
GROUP BY
  year

The above query will now yield:

year   count
------------
2012   4
2011   5
2010   4
2009   4

And that’s not it! As always, you can use any aggregate function also as a window function simply by adding an OVER() clause to the end:

SELECT
  year,
  code,
  gdp_per_capita,
  count(*) 
    FILTER (WHERE gdp_per_capita >= 40000) 
    OVER   (PARTITION BY year)
FROM
  countries

The result would then look something like this:

year   code   gdp_per_capita   count
------------------------------------
2009   CA           40764.00       4
2009   DE           40270.00       4
2009   FR           40488.00       4
2009   GB           35455.00       4

jOOQ 3.6 will also support the new FILTER clause for aggregate functions

Good news for jOOQ users. You can write the same query with jOOQ intuitively as such:

DSL.using(configuration)
   .select(
        COUNTRIES.YEAR,
        count().filterWhere(
            COUNTRIES.GDP_PER_CAPITA.ge(40000)
        ))
   .from(COUNTRIES)
   .groupBy(COUNTRIES.YEAR)
   .fetch();

… and

DSL.using(configuration)
   .select(
        COUNTRIES.YEAR,
        COUNTRIES.CODE,
        COUNTRIES.GDP_PER_CAPITA
        count().filterWhere(
                   COUNTRIES.GDP_PER_CAPITA.ge(40000))
               .over(partitionBy(COUNTRIES.YEAR)))
   .from(COUNTRIES)
   .fetch();

And the best thing is that jOOQ (as usual) emulates the above clause for you if you’re not using PostgreSQL. The equivalent query would be:

SELECT
  year,
  count(CASE WHEN gdp_per_capita >= 40000 THEN 1 END)
FROM
  countries
GROUP BY
  year

jOOQ: The best way to write SQL in Java

Read more about what’s new in PostgreSQL 9.4 here

6 thoughts on “The Awesome PostgreSQL 9.4 / SQL:2003 FILTER Clause for Aggregate Functions

  1. Why do you need the filter clause for example one, regular SQL handles this just fine. “SELECT year, count(*) FROM countries WHERE gdp_per_capita >= 40000 GROUP BY year” returns exactly the same result. I’m sure FILTER serves a purpose but your example is already covered by standard SQL.

    • Hi Greg.

      Your example:

      SELECT year, count(*) 
      FROM countries 
      WHERE gdp_per_capita >= 40000 
      GROUP BY year
      

      … is not the same as the article’s that is using FILTER. Perhaps the example was a bit too simplistic to explain the full scope of this “new” functionality, but in any case, in your example, if a year only has one record with gdp_per_capita = 30000, the year will not show up in your result, but it will show up in the FILTER‘ed result, with a COUNT(*) of 0.

      For one, the FILTER clause *is* “regular” and “standard” SQL. It has been specified in SQL:2003. Besides, have you tried calculating two such aggregations with different filters in one go? There you have another use-case for FILTER. In fact, it is a more standard and more generally useful implementation of Oracle’s / SQL Server’s PIVOT clause.

      Hope this helps,
      Lukas

    • In most cases you’re right. But I suspect you cannot emulate FILTER with CASE when using DISTINCT:

      SELECT year, COUNT(DISTINCT gdp_per_capita)
                   FILTER(gdp_per_capita >= 40000)
      FROM countries
      GROUP BY year
      
      • Distinct relative to what? If you want distinct values of something, why don’t you express this explicitly in the inner view?

        • The DISTINCT keyword in an aggregate function will count only the DISTINCT values in each group, instead of counting ALL the values.

          What do you mean by “expressing this explicitly in the inner view”?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s