Creating a Microsoft Excel Style Pivot Table With Grand Totals in SQL

This answer to a beautiful Stack Overflow question I’ve given recently needs further explanation in a blog post. When working with Microsoft Excel, we can create beautiful and also very insightful Pivot Tables with grand totals. What are they? This is best explained visually. Assuming you have this normalised form for your raw data. As in the question, it’s an inventory table in a bike shop:

Now, in order to analyse our inventory, we’d love to pivot the above normalised representation to the following non-normalised representation, and we’d also like to display the grand totals to learn how many bikes of each type we have, and how many bikes of each colour, and how many bikes in total:

There are tons of great tutorials out there explaining how to do this with Microsoft Excel. What we care about is:

How to do this with SQL

We’re using two SQL features for this:

Let’s create some data first. I’m going to use SQL Server syntax for most of this blog post. At the end, there will be a full solution for SQL Server, Oracle, and PostgreSQL:


WITH Bikes AS (
  SELECT * FROM (
    VALUES ('Mountain Bikes', 'Black'), 
           ('Mountain Bikes', 'Black'),
           ('Mountain Bikes', 'Silver'),
           ('Road Bikes', 'Red'),
           ('Road Bikes', 'Red'),
           ('Road Bikes', 'Red'),
           ('Road Bikes', 'Black'),
           ('Road Bikes', 'Yellow'),
           ('Touring Bikes', 'Blue'),
           ('Touring Bikes', 'Blue'),
           ('Touring Bikes', 'Yellow')
  ) AS Bikes (Name, Colour)
)
SELECT * FROM Bikes

This simply produces an in-memory table representation of our original, normalised data set. Now, the first step is to create the following totals and grand totals:

Total bikes per name and colour
(Grand) total bikes per name
(Grand) total bikes per colour
(Grand) total bikes

In this particular case, we can use CUBE(), which forms all the possible GROUPING SETS combinations:


SELECT Name, Colour, COUNT(*) AS Total
FROM Bikes
GROUP BY CUBE (Name, Colour)
ORDER BY Name, Colour

The result looks like this:

Name            Colour  Total
-----------------------------
NULL            NULL    11
NULL            Black   3
NULL            Blue    2
NULL            Red     3
NULL            Silver  1
NULL            Yellow  2
Mountain Bikes  NULL    3
Mountain Bikes  Black   2
Mountain Bikes  Silver  1
Road Bikes      NULL    5
Road Bikes      Black   1
Road Bikes      Red     3
Road Bikes      Yellow  1
Touring Bikes   NULL    3
Touring Bikes   Blue    2
Touring Bikes   Yellow  1

Excellent! All the (grand) totals are now in the result set. Notice that we could have manually written this using the following, much more tedious syntax:


SELECT Name, Colour, COUNT(*) AS Total
FROM Bikes
GROUP BY Name, Colour
UNION ALL
SELECT Name, NULL, COUNT(*) AS Total
FROM Bikes
GROUP BY Name
UNION ALL
SELECT NULL, Colour, COUNT(*) AS Total
FROM Bikes
GROUP BY Colour
UNION ALL
SELECT NULL, NULL, COUNT(*) AS Total
FROM Bikes
ORDER BY Name, Colour

So, CUBE() (and ROLLUP() and GROUPING SETS()) is just syntax sugar for the above more verbose UNION ALL representation, with the additional important difference that very likely you’re going to get a much more optimal execution plan using CUBE():

Than using manual UNION ALL:

The result would be similar in Oracle and other databases. This isn’t surprising. We can aggregate all the grand totals in one go with CUBE() (in fact, the “grand grand total” is calculated separately in this case), whereas it’s hard for the optimiser to prove that the UNION ALL version is really the same thing and the individual subqueries can be factored out. Before we move on, just a slight improvement, let’s rename the grand totals from NULL to Total and wrap the thing in a derived table T:


SELECT *
FROM (
  SELECT 
    COALESCE(Name, 'Total') AS Name, 
    COALESCE(Colour, 'Total') AS Colour, 
    COUNT(*) AS Count
  FROM Bikes
  GROUP BY CUBE (Name, Colour)
) AS t

Now, pivot this representation into a more readable one

The data still looks normalised with repeating names and colours in the result tables. Let’s pivot it using … wait for it … the PIVOT clause (available in Oracle and SQL Server). The PIVOT clause is a bit funky. It can be appended to any table expression (including derived tables) to pivot it. It will apply an implicit GROUP BY operation and generate a set of aggregated SELECT columns. When we pivot our previous derived table T:


SELECT *
FROM t
PIVOT (
  SUM(Count) FOR Colour IN (
    Red, Blue, Black, Silver, Yellow, 
    Grey, Multi, Uncoloured, Total
  )
) AS p

Then we’re getting the following, nice-looking result:

Name            Red     Blue    Black   Silver  Yellow  Grey    Multi   Uncoloured  Total
-----------------------------------------------------------------------------------------
Mountain Bikes  NULL    NULL    2       1       NULL    NULL    NULL    NULL        3
Road Bikes      3       NULL    1       NULL    1       NULL    NULL    NULL        5
Touring Bikes   NULL    2       NULL    NULL    1       NULL    NULL    NULL        3
Total           3       2       3       1       2       NULL    NULL    NULL        11

That’s almost the desired result – all that’s missing is some null handling. How does it work? We have the following syntax:

[ table ] PIVOT (
  [ aggregate function(s) ] FOR [ column(s) ] IN ( [ values ] )
)

Where

The [ table ] is the table being pivoted
The [ column(s) ] are the columns from the [ table ] being grouped, as in any ordinary GROUP BY clause
The [ values ] are the values of the [ column(s) ], for which filtered aggregations are made
The [ aggregate function(s) ] are the aggregations that are made per [ column(s) ] (group) and per [ value ] (filter)

This syntax is Oracle and SQL Server specific. Oracle can do a bit more than SQL Server. If this syntax is not available in your database, you can write it out manually again (just like the above CUBE() to get this (select with your mouse to remove colouring):

SELECT
  Name,
  SUM(CASE WHEN Colour = 'Red'        THEN Count END) AS Red,
  SUM(CASE WHEN Colour = 'Blue'       THEN Count END) AS Blue,
  SUM(CASE WHEN Colour = 'Black'      THEN Count END) AS Black,
  SUM(CASE WHEN Colour = 'Silver'     THEN Count END) AS Silver,
  SUM(CASE WHEN Colour = 'Yellow'     THEN Count END) AS Yellow,
  SUM(CASE WHEN Colour = 'Grey'       THEN Count END) AS Grey,
  SUM(CASE WHEN Colour = 'Multi'      THEN Count END) AS Multi,
  SUM(CASE WHEN Colour = 'Uncoloured' THEN Count END) AS Uncoloured,
  SUM(CASE WHEN Colour = 'Total'      THEN Count END) AS Total
FROM t
GROUP BY Name

There should be no performance penalty in the manually written version (although, as always, do check). More details about this in a previous article.

Putting it all together

Here’s the complete query in SQL Server:


WITH Bikes(Name, Colour) AS (
  SELECT * FROM (
    VALUES ('Mountain Bikes', 'Black'), 
           ('Mountain Bikes', 'Black'),
           ('Mountain Bikes', 'Silver'),
           ('Road Bikes', 'Red'),
           ('Road Bikes', 'Red'),
           ('Road Bikes', 'Red'),
           ('Road Bikes', 'Black'),
           ('Road Bikes', 'Yellow'),
           ('Touring Bikes', 'Blue'),
           ('Touring Bikes', 'Blue'),
           ('Touring Bikes', 'Yellow')
  ) AS Bikes(Name, Colour)
)
SELECT
  Name, 
  COALESCE(Red, 0) AS Red, 
  COALESCE(Blue, 0) AS Blue, 
  COALESCE(Black, 0) AS Black, 
  COALESCE(Silver, 0) AS Silver, 
  COALESCE(Yellow, 0) AS Yellow, 
  COALESCE(Grey, 0) AS Grey, 
  COALESCE(Multi, 0) AS Multi, 
  COALESCE(Uncoloured, 0) AS Uncoloured, 
  Total
FROM (
  SELECT 
    COALESCE(Name, 'Total') AS Name, 
    COALESCE(Colour, 'Total') AS Colour, 
    COUNT(*) AS Count
  FROM Bikes
  GROUP BY CUBE (Name, Colour)
) AS t
PIVOT (
  SUM(Count) FOR Colour IN (
    Red, Blue, Black, Silver, Yellow, 
    Grey, Multi, Uncoloured, Total
  )
) AS p
ORDER BY CASE Name WHEN 'Total' THEN 1 ELSE 0 END, Name

Or Oracle:


WITH Bikes(Name, Colour) AS (
  SELECT 'Mountain Bikes', 'Black'  FROM dual UNION ALL
  SELECT 'Mountain Bikes', 'Black'  FROM dual UNION ALL
  SELECT 'Mountain Bikes', 'Silver' FROM dual UNION ALL
  SELECT 'Road Bikes',     'Red'    FROM dual UNION ALL
  SELECT 'Road Bikes',     'Red'    FROM dual UNION ALL
  SELECT 'Road Bikes',     'Red'    FROM dual UNION ALL
  SELECT 'Road Bikes',     'Black'  FROM dual UNION ALL
  SELECT 'Road Bikes',     'Yellow' FROM dual UNION ALL
  SELECT 'Touring Bikes',  'Blue'   FROM dual UNION ALL
  SELECT 'Touring Bikes',  'Blue'   FROM dual UNION ALL
  SELECT 'Touring Bikes',  'Yellow' FROM dual
)
SELECT
  Name, 
  COALESCE(Red, 0) AS Red, 
  COALESCE(Blue, 0) AS Blue, 
  COALESCE(Black, 0) AS Black, 
  COALESCE(Silver, 0) AS Silver, 
  COALESCE(Yellow, 0) AS Yellow, 
  COALESCE(Grey, 0) AS Grey, 
  COALESCE(Multi, 0) AS Multi, 
  COALESCE(Uncoloured, 0) AS Uncoloured, 
  Total
FROM (
  SELECT 
    COALESCE(Name, 'Total') AS Name, 
    COALESCE(Colour, 'Total') AS Colour, 
    COUNT(*) AS Count
  FROM Bikes
  GROUP BY CUBE (Name, Colour)
) t
PIVOT (
  SUM(Count) FOR Colour IN (
    'Red' AS Red, 
    'Blue' AS Blue, 
    'Black' AS Black, 
    'Silver' AS Silver, 
    'Yellow' AS Yellow, 
    'Grey' AS Grey, 
    'Multi' AS Multi, 
    'Uncoloured' AS Uncoloured, 
    'Total' AS Total
  )
) p
ORDER BY CASE Name WHEN 'Total' THEN 1 ELSE 0 END, Name

Or PostgreSQL:


WITH Bikes(Name, Colour) AS (
  SELECT * FROM (
    VALUES ('Mountain Bikes', 'Black'), 
           ('Mountain Bikes', 'Black'),
           ('Mountain Bikes', 'Silver'),
           ('Road Bikes', 'Red'),
           ('Road Bikes', 'Red'),
           ('Road Bikes', 'Red'),
           ('Road Bikes', 'Black'),
           ('Road Bikes', 'Yellow'),
           ('Touring Bikes', 'Blue'),
           ('Touring Bikes', 'Blue'),
           ('Touring Bikes', 'Yellow')
  ) AS Bikes(Name, Colour)
)
SELECT
  Name,
  COALESCE(SUM(Count) FILTER (WHERE Colour = 'Red'       ), 0) AS Red,
  COALESCE(SUM(Count) FILTER (WHERE Colour = 'Blue'      ), 0) AS Blue,
  COALESCE(SUM(Count) FILTER (WHERE Colour = 'Black'     ), 0) AS Black,
  COALESCE(SUM(Count) FILTER (WHERE Colour = 'Silver'    ), 0) AS Silver,
  COALESCE(SUM(Count) FILTER (WHERE Colour = 'Yellow'    ), 0) AS Yellow,
  COALESCE(SUM(Count) FILTER (WHERE Colour = 'Grey'      ), 0) AS Grey,
  COALESCE(SUM(Count) FILTER (WHERE Colour = 'Multi'     ), 0) AS Multi,
  COALESCE(SUM(Count) FILTER (WHERE Colour = 'Uncoloured'), 0) AS Uncoloured,
  COALESCE(SUM(Count) FILTER (WHERE Colour = 'Total'     ), 0) AS Total
FROM (
  SELECT 
    COALESCE(Name, 'Total') AS Name, 
    COALESCE(Colour, 'Total') AS Colour, 
    COUNT(*) AS Count
  FROM Bikes
  GROUP BY CUBE (Name, Colour)
) AS t
GROUP BY Name
ORDER BY CASE Name WHEN 'Total' THEN 1 ELSE 0 END, Name

Or MySQL (which doesn’t support CUBE, only ROLLUP, thus slightly tweaked PostgreSQL variant):


WITH Bikes(Name, Colour) AS (
  SELECT 'Mountain Bikes', 'Black'  UNION ALL
  SELECT 'Mountain Bikes', 'Black'  UNION ALL
  SELECT 'Mountain Bikes', 'Silver' UNION ALL
  SELECT 'Road Bikes',     'Red'    UNION ALL
  SELECT 'Road Bikes',     'Red'    UNION ALL
  SELECT 'Road Bikes',     'Red'    UNION ALL
  SELECT 'Road Bikes',     'Black'  UNION ALL
  SELECT 'Road Bikes',     'Yellow' UNION ALL
  SELECT 'Touring Bikes',  'Blue'   UNION ALL
  SELECT 'Touring Bikes',  'Blue'   UNION ALL
  SELECT 'Touring Bikes',  'Yellow'
)
SELECT
  Name,
  COALESCE(SUM(CASE WHEN Colour = 'Red'        THEN Count END), 0) AS Red,
  COALESCE(SUM(CASE WHEN Colour = 'Blue'       THEN Count END), 0) AS Blue,
  COALESCE(SUM(CASE WHEN Colour = 'Black'      THEN Count END), 0) AS Black,
  COALESCE(SUM(CASE WHEN Colour = 'Silver'     THEN Count END), 0) AS Silver,
  COALESCE(SUM(CASE WHEN Colour = 'Yellow'     THEN Count END), 0) AS Yellow,
  COALESCE(SUM(CASE WHEN Colour = 'Grey'       THEN Count END), 0) AS Grey,
  COALESCE(SUM(CASE WHEN Colour = 'Multi'      THEN Count END), 0) AS Multi,
  COALESCE(SUM(CASE WHEN Colour = 'Uncoloured' THEN Count END), 0) AS Uncoloured,
  COALESCE(SUM(CASE WHEN Name != 'Total' OR Colour != 'Total' THEN Count END), 0) AS Total
FROM (
  SELECT 
    COALESCE(Name, 'Total') AS Name, 
    COALESCE(Colour, 'Total') AS Colour, 
    COUNT(*) AS Count
  FROM Bikes
  GROUP BY Colour, Name WITH ROLLUP
) AS t
GROUP BY Name
ORDER BY CASE Name WHEN 'Total' THEN 1 ELSE 0 END, Name

Conclusion

Whenever working with data and SQL, try finding an elegant solution with SQL. There are many tools for a variety of data processing and data presentation use-cases. Here are some other cool, related reads from our blog:

Creating a Microsoft Excel Style Pivot Table With Grand Totals in SQL

How to do this with SQL

Now, pivot this representation into a more readable one

Putting it all together

Conclusion

Like this:

Published by lukaseder

Leave a ReplyCancel reply

How to do this with SQL

Now, pivot this representation into a more readable one

Putting it all together

Conclusion

Like this:

Published by lukaseder

Leave a ReplyCancel reply

Discover more from Java, SQL and jOOQ.