Sometimes when aggregating data with SQL, we’d love to add some additional filters. For instance, consider the following world bank data:
GDP per capita (current US$)
2009 2010 2011 2012 CA 40,764 47,465 51,791 52,409 DE 40,270 40,408 44,355 42,598 FR 40,488 39,448 42,578 39,759 GB 35,455 36,573 38,927 38,649 IT 35,724 34,673 36,988 33,814 JP 39,473 43,118 46,204 46,548 RU 8,616 10,710 13,324 14,091 US 46,999 48,358 49,855 51,755
And the table structure:
CREATE TABLE countries ( code CHAR(2) NOT NULL, year INT NOT NULL, gdp_per_capita DECIMAL(10, 2) NOT NULL );
Now, let’s assume we’d like to find the number of countries with a GDP higher than 40,000 for each year.
With standard SQL:2003, and now also with the newly released PostgreSQL 9.4, we can now take advantage of the new FILTER
clause, which allows us to write the following query:
SELECT year, count(*) FILTER (WHERE gdp_per_capita >= 40000) FROM countries GROUP BY year
The above query will now yield:
year count ------------ 2012 4 2011 5 2010 4 2009 4
And that’s not it! As always, you can use any aggregate function also as a window function simply by adding an OVER()
clause to the end:
SELECT year, code, gdp_per_capita, count(*) FILTER (WHERE gdp_per_capita >= 40000) OVER (PARTITION BY year) FROM countries
The result would then look something like this:
year code gdp_per_capita count ------------------------------------ 2009 CA 40764.00 4 2009 DE 40270.00 4 2009 FR 40488.00 4 2009 GB 35455.00 4
jOOQ 3.6 will also support the new FILTER clause for aggregate functions
Good news for jOOQ users. You can write the same query with jOOQ intuitively as such:
DSL.using(configuration) .select( COUNTRIES.YEAR, count().filterWhere( COUNTRIES.GDP_PER_CAPITA.ge(40000) )) .from(COUNTRIES) .groupBy(COUNTRIES.YEAR) .fetch();
… and
DSL.using(configuration) .select( COUNTRIES.YEAR, COUNTRIES.CODE, COUNTRIES.GDP_PER_CAPITA count().filterWhere( COUNTRIES.GDP_PER_CAPITA.ge(40000)) .over(partitionBy(COUNTRIES.YEAR))) .from(COUNTRIES) .fetch();
And the best thing is that jOOQ (as usual) emulates the above clause for you if you’re not using PostgreSQL. The equivalent query would be:
SELECT year, count(CASE WHEN gdp_per_capita >= 40000 THEN 1 END) FROM countries GROUP BY year
Why do you need the filter clause for example one, regular SQL handles this just fine. “SELECT year, count(*) FROM countries WHERE gdp_per_capita >= 40000 GROUP BY year” returns exactly the same result. I’m sure FILTER serves a purpose but your example is already covered by standard SQL.
Hi Greg.
Your example:
… is not the same as the article’s that is using
FILTER
. Perhaps the example was a bit too simplistic to explain the full scope of this “new” functionality, but in any case, in your example, if a year only has one record withgdp_per_capita = 30000
, the year will not show up in your result, but it will show up in theFILTER
‘ed result, with aCOUNT(*)
of 0.For one, the FILTER clause *is* “regular” and “standard” SQL. It has been specified in SQL:2003. Besides, have you tried calculating two such aggregations with different filters in one go? There you have another use-case for
FILTER
. In fact, it is a more standard and more generally useful implementation of Oracle’s / SQL Server’sPIVOT
clause.Hope this helps,
Lukas
Seems like FILTER is a syntax sugar for COUNT + CASE:
Is that?
In most cases you’re right. But I suspect you cannot emulate
FILTER
withCASE
when usingDISTINCT
:Distinct relative to what? If you want distinct values of something, why don’t you express this explicitly in the inner view?
The
DISTINCT
keyword in an aggregate function will count only the DISTINCT values in each group, instead of counting ALL the values.What do you mean by “expressing this explicitly in the inner view”?