Description COUNT ------------------- TEST1 10 TEST2 15 TEST3 25 TEST4 50The logic that should be implemented for the COUNT column is the following:
- TEST1: count of employees whose sal < 10000
- TEST2: count of employees whose dept > 10
- TEST3: count of employees whose hiredate > (SYSDATE-60)
- TEST4: count of employees whose grade = 1
CREATE TABLE employees ( id NUMBER(18) NOT NULL PRIMARY KEY, sal NUMBER(18, 2) NOT NULL, dept NUMBER(18) NOT NULL, hiredate DATE NOT NULL, grade NUMBER(18) NOT NULL ); INSERT INTO employees VALUES (1, 10000, 1, SYSDATE , 1); INSERT INTO employees VALUES (2, 9000, 5, SYSDATE - 10, 1); INSERT INTO employees VALUES (3, 11000, 13, SYSDATE - 30, 2); INSERT INTO employees VALUES (4, 10000, 12, SYSDATE - 80, 2); INSERT INTO employees VALUES (5, 8000, 7, SYSDATE - 90, 1);
How to calculate the COUNT valuesIn a first step, we’re going to look into how to best calculate the COUNT values. The simplest way is to calculate the values in individual columns, not rows. SQL newbies will probably resort to a canonical solution using nested SELECTs, which is very bad for performance reasons:
Why is the query not optimal? There are four table accesses to find all the data: If you add an index to each individual column being filtered, chances are at least to optimise individual subqueries, but for these kinds of reports, the occasional full table scan is perfectly fine, especially if you aggregate a lot of data. Even if not optimal in speed, the above yields the correct result:
SELECT (SELECT COUNT(*) FROM employees WHERE sal < 10000) AS test1, (SELECT COUNT(*) FROM employees WHERE dept > 10) AS test2, (SELECT COUNT(*) FROM employees WHERE hiredate > (SYSDATE - 60)) AS test3, (SELECT COUNT(*) FROM employees WHERE grade = 1) AS test4 FROM dual;
TEST1 TEST2 TEST3 TEST4 ----------------------------- 2 2 3 3
How to improve the query, then?Few people are aware of the fact that aggregate functions only aggregate non-
NULLvalues. This has no effect, when you write
COUNT(*), but when you pass an expression to the
COUNT(expr)function, this becomes much more interesting! The idea here is that you use a
CASEexpression that transforms each predicate’s
TRUEevaluation into a non-
NULLvalue, an the
NULL) evaluation into
NULL. The following query illustrates this approach
… and yields again the correct result:
SELECT COUNT(CASE WHEN sal < 10000 THEN 1 END) AS test1, COUNT(CASE WHEN dept > 10 THEN 1 END) AS test2, COUNT(CASE WHEN hiredate > (SYSDATE-60) THEN 1 END) AS test3, COUNT(CASE WHEN grade = 1 THEN 1 END) AS test4 FROM employees;
TEST1 TEST2 TEST3 TEST4 ----------------------------- 2 2 3 3
Using FILTER() instead of CASEThe SQL standard and the awesome PostgreSQL database offer an even more convenient syntax for the above functionality. The little known
FILTER()clause on aggregate functions. In PostgreSQL, you’d write instead:
This is useful when you want to cleanly separate the
SELECT COUNT(*) FILTER (WHERE sal < 10000) AS test1, COUNT(*) FILTER (WHERE dept > 10) AS test2, COUNT(*) FILTER (WHERE hiredate > (SYSDATE - 60)) AS test3, COUNT(*) FILTER (WHERE grade = 1) AS test4 FROM employees;
FILTER()criteria from any other expression that you want to use for aggregating. E.g. when calculating a
SUM(). In any case, the query now has to hit the table only once. The aggregation can then be performed entirely in memory. This is always better than the previous approach, unless you have an index for every aggregation!
OK. Now how to get the results in rows?The question on Stack Overflow wanted a result with
TESTnvalues being put in individual rows, not columns.
Description COUNT ------------------- TEST1 2 TEST2 2 TEST3 3 TEST4 3Again, there’s a canonical, not so performant approach to do this with UNION ALL:
This approach is more or less equivalent to the nested selects approach, except for the column / row transposition (“unpivoting”). And the plan is also very similar:
SELECT 'TEST1' AS Description, COUNT(*) AS COUNT FROM employees WHERE sal < 10000 UNION ALL SELECT 'TEST2', COUNT(*) FROM employees WHERE dept > 10 UNION ALL SELECT 'TEST3', COUNT(*) FROM employees WHERE hiredate > (SYSDATE - 60) UNION ALL SELECT 'TEST4', COUNT(*) FROM employees WHERE grade = 1
Transposition = (un)pivotingNotice how I used the term “transpose”. That’s what we did, and it has a name: (un)pivoting. Not only does it have a name, but this feature is also supported out of the box in Oracle and SQL Server via the
UNPIVOTkeywords that can be placed after table references.
PIVOTtransposes rows into columns
UNPIVOTtransposes columns back into rows
All we need to do is wrap the original query in a derived table
SELECT * FROM ( SELECT COUNT(CASE WHEN sal < 10000 THEN 1 END) AS test1, COUNT(CASE WHEN dept > 10 THEN 1 END) AS test2, COUNT(CASE WHEN hiredate > (SYSDATE-60) THEN 1 END) AS test3, COUNT(CASE WHEN grade = 1 THEN 1 END) AS test4 FROM employees ) t UNPIVOT ( count FOR description IN ( "TEST1", "TEST2", "TEST3", "TEST4" ) )
t(i.e. an inline
FROMclause), and then “
UNPIVOT” that table
t, generating the
descriptioncolumns. The result is, again:
Description COUNT ------------------- TEST1 2 TEST2 2 TEST3 3 TEST4 3The execution plan is still optimal. All the action is happening in memory.
UNPIVOTare very useful tools for reporting and reorganising data. There are many use-cases like the above, where you want to re-organise some aggregations. Other use-cases include settings or properties tables that implement an entity attribute value model, and you want to transform attributes from rows to columns (
PIVOT), or from columns to rows (
UNPIVOT) Intrigued? Read on about
4 thoughts on “Impress Your Coworkers by Using SQL UNPIVOT!”
Shouldn’t the UNPIVOT query be like this instead?
The one on your post is missing the `Description` and the `COUNT` aliases.
I invite you to observe with me the miracles of UNPIVOT in action:
Also, please don’t go ahead with that
UNION ALLapproach. I explained in the article why it’s not optional…
I don’t get it. The inner-table only has 4 columns: test1, test2, test3 and test4. How does the UNPIVOT know that the description is in those columns because it has no notion of any `description` column.
How does it not have any notion of a “description” column? I named it “description” in the UNPIVOT clause:
Now, this being intuitive or not is another question ;)