This answer to a beautiful Stack Overflow question I’ve given recently needs further explanation in a blog post. When working with Microsoft Excel, we can create beautiful and also very insightful Pivot Tables with grand totals. What are they? This is best explained visually.
Assuming you have this normalised form for your raw data. As in the question, it’s an inventory table in a bike shop:
Now, in order to analyse our inventory, we’d love to pivot the above normalised representation to the following non-normalised representation, and we’d also like to display the grand totals to learn how many bikes of each type we have, and how many bikes of each colour, and how many bikes in total:
There are tons of great tutorials out there explaining how to do this with Microsoft Excel. What we care about is:
Let’s create some data first. I’m going to use SQL Server syntax for most of this blog post. At the end, there will be a full solution for SQL Server, Oracle, and PostgreSQL:
This simply produces an in-memory table representation of our original, normalised data set.
Now, the first step is to create the following totals and grand totals:
Total bikes per name and colour
(Grand) total bikes per name
(Grand) total bikes per colour
(Grand) total bikes
In this particular case, we can use CUBE(), which forms all the possible GROUPING SETS combinations:
SELECT Name, Colour, COUNT(*) AS Total
FROM Bikes
GROUP BY CUBE (Name, Colour)
ORDER BY Name, Colour
The result looks like this:
Name Colour Total
-----------------------------
NULL NULL 11
NULL Black 3
NULL Blue 2
NULL Red 3
NULL Silver 1
NULL Yellow 2
Mountain Bikes NULL 3
Mountain Bikes Black 2
Mountain Bikes Silver 1
Road Bikes NULL 5
Road Bikes Black 1
Road Bikes Red 3
Road Bikes Yellow 1
Touring Bikes NULL 3
Touring Bikes Blue 2
Touring Bikes Yellow 1
Excellent! All the (grand) totals are now in the result set. Notice that we could have manually written this using the following, much more tedious syntax:
SELECT Name, Colour, COUNT(*) AS Total
FROM Bikes
GROUP BY Name, Colour
UNION ALL
SELECT Name, NULL, COUNT(*) AS Total
FROM Bikes
GROUP BY Name
UNION ALL
SELECT NULL, Colour, COUNT(*) AS Total
FROM Bikes
GROUP BY Colour
UNION ALL
SELECT NULL, NULL, COUNT(*) AS Total
FROM Bikes
ORDER BY Name, Colour
So, CUBE() (and ROLLUP() and GROUPING SETS()) is just syntax sugar for the above more verbose UNION ALL representation, with the additional important difference that very likely you’re going to get a much more optimal execution plan using CUBE():
Than using manual UNION ALL:
The result would be similar in Oracle and other databases.
This isn’t surprising. We can aggregate all the grand totals in one go with CUBE() (in fact, the “grand grand total” is calculated separately in this case), whereas it’s hard for the optimiser to prove that the UNION ALL version is really the same thing and the individual subqueries can be factored out.
Before we move on, just a slight improvement, let’s rename the grand totals from NULL to Total and wrap the thing in a derived table T:
SELECT *
FROM (
SELECT
COALESCE(Name, 'Total') AS Name,
COALESCE(Colour, 'Total') AS Colour,
COUNT(*) AS Count
FROM Bikes
GROUP BY CUBE (Name, Colour)
) AS t
Now, pivot this representation into a more readable one
The data still looks normalised with repeating names and colours in the result tables. Let’s pivot it using … wait for it … the PIVOT clause (available in Oracle and SQL Server).
The PIVOT clause is a bit funky. It can be appended to any table expression (including derived tables) to pivot it. It will apply an implicit GROUP BY operation and generate a set of aggregated SELECT columns. When we pivot our previous derived table T:
SELECT *
FROM t
PIVOT (
SUM(Count) FOR Colour IN (
Red, Blue, Black, Silver, Yellow,
Grey, Multi, Uncoloured, Total
)
) AS p
Then we’re getting the following, nice-looking result:
Name Red Blue Black Silver Yellow Grey Multi Uncoloured Total
-----------------------------------------------------------------------------------------
Mountain Bikes NULL NULL 2 1 NULL NULL NULL NULL 3
Road Bikes 3 NULL 1 NULL 1 NULL NULL NULL 5
Touring Bikes NULL 2 NULL NULL 1 NULL NULL NULL 3Total 3 2 3 1 2 NULL NULL NULL 11
That’s almost the desired result – all that’s missing is some null handling. How does it work? We have the following syntax:
[ table ] PIVOT (
[ aggregate function(s) ] FOR [ column(s) ] IN ( [ values ] )
)
Where
The [ table ] is the table being pivoted
The [ column(s) ] are the columns from the [ table ] being grouped, as in any ordinary GROUP BY clause
The [ values ] are the values of the [ column(s) ], for which filtered aggregations are made
The [ aggregate function(s) ] are the aggregations that are made per [ column(s) ] (group) and per [ value ] (filter)
This syntax is Oracle and SQL Server specific. Oracle can do a bit more than SQL Server. If this syntax is not available in your database, you can write it out manually again (just like the above CUBE() to get this (select with your mouse to remove colouring):
SELECT
Name,
SUM(CASE WHEN Colour = 'Red' THEN Count END)AS Red,
SUM(CASE WHEN Colour = 'Blue' THEN Count END)AS Blue,
SUM(CASE WHEN Colour = 'Black' THEN Count END)AS Black,
SUM(CASE WHEN Colour = 'Silver' THEN Count END)AS Silver,
SUM(CASE WHEN Colour = 'Yellow' THEN Count END)AS Yellow,
SUM(CASE WHEN Colour = 'Grey' THEN Count END)AS Grey,
SUM(CASE WHEN Colour = 'Multi' THEN Count END)AS Multi,
SUM(CASE WHEN Colour = 'Uncoloured' THEN Count END)AS Uncoloured,
SUM(CASE WHEN Colour = 'Total' THEN Count END)AS TotalFROM tGROUP BY Name
WITH Bikes(Name, Colour) AS (
SELECT * FROM (
VALUES ('Mountain Bikes', 'Black'),
('Mountain Bikes', 'Black'),
('Mountain Bikes', 'Silver'),
('Road Bikes', 'Red'),
('Road Bikes', 'Red'),
('Road Bikes', 'Red'),
('Road Bikes', 'Black'),
('Road Bikes', 'Yellow'),
('Touring Bikes', 'Blue'),
('Touring Bikes', 'Blue'),
('Touring Bikes', 'Yellow')
) AS Bikes(Name, Colour)
)
SELECT
Name,
COALESCE(Red, 0) AS Red,
COALESCE(Blue, 0) AS Blue,
COALESCE(Black, 0) AS Black,
COALESCE(Silver, 0) AS Silver,
COALESCE(Yellow, 0) AS Yellow,
COALESCE(Grey, 0) AS Grey,
COALESCE(Multi, 0) AS Multi,
COALESCE(Uncoloured, 0) AS Uncoloured,
Total
FROM (
SELECT
COALESCE(Name, 'Total') AS Name,
COALESCE(Colour, 'Total') AS Colour,
COUNT(*) AS Count
FROM Bikes
GROUP BY CUBE (Name, Colour)
) AS t
PIVOT (
SUM(Count) FOR Colour IN (
Red, Blue, Black, Silver, Yellow,
Grey, Multi, Uncoloured, Total
)
) AS p
ORDER BY CASE Name WHEN 'Total' THEN 1 ELSE 0 END, Name
Or Oracle:
WITH Bikes(Name, Colour) AS (
SELECT 'Mountain Bikes', 'Black' FROM dual UNION ALL
SELECT 'Mountain Bikes', 'Black' FROM dual UNION ALL
SELECT 'Mountain Bikes', 'Silver' FROM dual UNION ALL
SELECT 'Road Bikes', 'Red' FROM dual UNION ALL
SELECT 'Road Bikes', 'Red' FROM dual UNION ALL
SELECT 'Road Bikes', 'Red' FROM dual UNION ALL
SELECT 'Road Bikes', 'Black' FROM dual UNION ALL
SELECT 'Road Bikes', 'Yellow' FROM dual UNION ALL
SELECT 'Touring Bikes', 'Blue' FROM dual UNION ALL
SELECT 'Touring Bikes', 'Blue' FROM dual UNION ALL
SELECT 'Touring Bikes', 'Yellow' FROM dual
)
SELECT
Name,
COALESCE(Red, 0) AS Red,
COALESCE(Blue, 0) AS Blue,
COALESCE(Black, 0) AS Black,
COALESCE(Silver, 0) AS Silver,
COALESCE(Yellow, 0) AS Yellow,
COALESCE(Grey, 0) AS Grey,
COALESCE(Multi, 0) AS Multi,
COALESCE(Uncoloured, 0) AS Uncoloured,
Total
FROM (
SELECT
COALESCE(Name, 'Total') AS Name,
COALESCE(Colour, 'Total') AS Colour,
COUNT(*) AS Count
FROM Bikes
GROUP BY CUBE (Name, Colour)
) t
PIVOT (
SUM(Count) FOR Colour IN (
'Red' AS Red,
'Blue' AS Blue,
'Black' AS Black,
'Silver' AS Silver,
'Yellow' AS Yellow,
'Grey' AS Grey,
'Multi' AS Multi,
'Uncoloured' AS Uncoloured,
'Total' AS Total
)
) p
ORDER BY CASE Name WHEN 'Total' THEN 1 ELSE 0 END, Name
Or PostgreSQL:
WITH Bikes(Name, Colour) AS (
SELECT * FROM (
VALUES ('Mountain Bikes', 'Black'),
('Mountain Bikes', 'Black'),
('Mountain Bikes', 'Silver'),
('Road Bikes', 'Red'),
('Road Bikes', 'Red'),
('Road Bikes', 'Red'),
('Road Bikes', 'Black'),
('Road Bikes', 'Yellow'),
('Touring Bikes', 'Blue'),
('Touring Bikes', 'Blue'),
('Touring Bikes', 'Yellow')
) AS Bikes(Name, Colour)
)
SELECT
Name,
COALESCE(SUM(Count) FILTER (WHERE Colour = 'Red' ), 0) AS Red,
COALESCE(SUM(Count) FILTER (WHERE Colour = 'Blue' ), 0) AS Blue,
COALESCE(SUM(Count) FILTER (WHERE Colour = 'Black' ), 0) AS Black,
COALESCE(SUM(Count) FILTER (WHERE Colour = 'Silver' ), 0) AS Silver,
COALESCE(SUM(Count) FILTER (WHERE Colour = 'Yellow' ), 0) AS Yellow,
COALESCE(SUM(Count) FILTER (WHERE Colour = 'Grey' ), 0) AS Grey,
COALESCE(SUM(Count) FILTER (WHERE Colour = 'Multi' ), 0) AS Multi,
COALESCE(SUM(Count) FILTER (WHERE Colour = 'Uncoloured'), 0) AS Uncoloured,
COALESCE(SUM(Count) FILTER (WHERE Colour = 'Total' ), 0) AS Total
FROM (
SELECT
COALESCE(Name, 'Total') AS Name,
COALESCE(Colour, 'Total') AS Colour,
COUNT(*) AS Count
FROM Bikes
GROUP BY CUBE (Name, Colour)
) AS t
GROUP BY Name
ORDER BY CASE Name WHEN 'Total' THEN 1 ELSE 0 END, Name
Or MySQL (which doesn’t support CUBE, only ROLLUP, thus slightly tweaked PostgreSQL variant):
WITH Bikes(Name, Colour) AS (
SELECT 'Mountain Bikes', 'Black' UNION ALL
SELECT 'Mountain Bikes', 'Black' UNION ALL
SELECT 'Mountain Bikes', 'Silver' UNION ALL
SELECT 'Road Bikes', 'Red' UNION ALL
SELECT 'Road Bikes', 'Red' UNION ALL
SELECT 'Road Bikes', 'Red' UNION ALL
SELECT 'Road Bikes', 'Black' UNION ALL
SELECT 'Road Bikes', 'Yellow' UNION ALL
SELECT 'Touring Bikes', 'Blue' UNION ALL
SELECT 'Touring Bikes', 'Blue' UNION ALL
SELECT 'Touring Bikes', 'Yellow'
)
SELECT
Name,
COALESCE(SUM(CASE WHEN Colour = 'Red' THEN Count END), 0) AS Red,
COALESCE(SUM(CASE WHEN Colour = 'Blue' THEN Count END), 0) AS Blue,
COALESCE(SUM(CASE WHEN Colour = 'Black' THEN Count END), 0) AS Black,
COALESCE(SUM(CASE WHEN Colour = 'Silver' THEN Count END), 0) AS Silver,
COALESCE(SUM(CASE WHEN Colour = 'Yellow' THEN Count END), 0) AS Yellow,
COALESCE(SUM(CASE WHEN Colour = 'Grey' THEN Count END), 0) AS Grey,
COALESCE(SUM(CASE WHEN Colour = 'Multi' THEN Count END), 0) AS Multi,
COALESCE(SUM(CASE WHEN Colour = 'Uncoloured' THEN Count END), 0) AS Uncoloured,
COALESCE(SUM(CASE WHEN Name != 'Total' OR Colour != 'Total' THEN Count END), 0) AS Total
FROM (
SELECT
COALESCE(Name, 'Total') AS Name,
COALESCE(Colour, 'Total') AS Colour,
COUNT(*) AS Count
FROM Bikes
GROUP BY Colour, Name WITH ROLLUP
) AS t
GROUP BY Name
ORDER BY CASE Name WHEN 'Total' THEN 1 ELSE 0 END, Name
Conclusion
Whenever working with data and SQL, try finding an elegant solution with SQL. There are many tools for a variety of data processing and data presentation use-cases. Here are some other cool, related reads from our blog: