Sometimes when aggregating data with SQL, we’d love to add some additional filters. For instance, consider the following world bank data:
GDP per capita (current US$)
2009 2010 2011 2012
CA 40,764 47,465 51,791 52,409
DE 40,270 40,408 44,355 42,598
FR 40,488 39,448 42,578 39,759
GB 35,455 36,573 38,927 38,649
IT 35,724 34,673 36,988 33,814
JP 39,473 43,118 46,204 46,548
RU 8,616 10,710 13,324 14,091
US 46,999 48,358 49,855 51,755
And the table structure:
CREATE TABLE countries (
code CHAR(2) NOT NULL,
year INT NOT NULL,
gdp_per_capita DECIMAL(10, 2) NOT NULL
);
Now, let’s assume we’d like to find the number of countries with a GDP higher than 40,000 for each year.
With standard SQL:2003, and now also with the newly released
PostgreSQL 9.4, we can now take advantage of the new
FILTER
clause, which allows us to write the following query:
SELECT
year,
count(*) FILTER (WHERE gdp_per_capita >= 40000)
FROM
countries
GROUP BY
year
The above query will now yield:
year count
------------
2012 4
2011 5
2010 4
2009 4
And that’s not it! As always,
you can use any aggregate function also as a window function simply by adding an
OVER()
clause to the end:
SELECT
year,
code,
gdp_per_capita,
count(*)
FILTER (WHERE gdp_per_capita >= 40000)
OVER (PARTITION BY year)
FROM
countries
The result would then look something like this:
year code gdp_per_capita count
------------------------------------
2009 CA 40764.00 4
2009 DE 40270.00 4
2009 FR 40488.00 4
2009 GB 35455.00 4
jOOQ 3.6 will also support the new FILTER clause for aggregate functions
Good news for jOOQ users. You can write the same query with jOOQ intuitively as such:
DSL.using(configuration)
.select(
COUNTRIES.YEAR,
count().filterWhere(
COUNTRIES.GDP_PER_CAPITA.ge(40000)
))
.from(COUNTRIES)
.groupBy(COUNTRIES.YEAR)
.fetch();
… and
DSL.using(configuration)
.select(
COUNTRIES.YEAR,
COUNTRIES.CODE,
COUNTRIES.GDP_PER_CAPITA
count().filterWhere(
COUNTRIES.GDP_PER_CAPITA.ge(40000))
.over(partitionBy(COUNTRIES.YEAR)))
.from(COUNTRIES)
.fetch();
And the best thing is that jOOQ (as usual) emulates the above clause for you if you’re not using PostgreSQL. The equivalent query would be:
SELECT
year,
count(CASE WHEN gdp_per_capita >= 40000 THEN 1 END)
FROM
countries
GROUP BY
year
Read more about what’s new in PostgreSQL 9.4 hereLike this:
Like Loading...
Why do you need the filter clause for example one, regular SQL handles this just fine. “SELECT year, count(*) FROM countries WHERE gdp_per_capita >= 40000 GROUP BY year” returns exactly the same result. I’m sure FILTER serves a purpose but your example is already covered by standard SQL.
Hi Greg.
Your example:
… is not the same as the article’s that is using
FILTER
. Perhaps the example was a bit too simplistic to explain the full scope of this “new” functionality, but in any case, in your example, if a year only has one record withgdp_per_capita = 30000
, the year will not show up in your result, but it will show up in theFILTER
‘ed result, with aCOUNT(*)
of 0.For one, the FILTER clause *is* “regular” and “standard” SQL. It has been specified in SQL:2003. Besides, have you tried calculating two such aggregations with different filters in one go? There you have another use-case for
FILTER
. In fact, it is a more standard and more generally useful implementation of Oracle’s / SQL Server’sPIVOT
clause.Hope this helps,
Lukas