How to Write a Multiplication Aggregate Function in SQL

Everyone knows the SQL SUM() aggregate function (and many people also know its window function variant). When querying the Sakila database, we can get the daily revenue (using PostgreSQL syntax):

WITH p AS (
  SELECT
    CAST (payment_date AS DATE) AS date,
    amount
  FROM payment
)
SELECT
  date,
  SUM (amount) AS daily_revenue,
  SUM (SUM (amount)) OVER (ORDER BY date) AS cumulative_revenue
FROM p
GROUP BY date
ORDER BY date

The result will look something like this:
date       |daily_revenue |cumulative_revenue 
-----------|--------------|-------------------
2005-05-24 |29.92         |29.92              
2005-05-25 |573.63        |603.55             
2005-05-26 |754.26        |1357.81            
2005-05-27 |685.33        |2043.14            
2005-05-28 |804.04        |2847.18            
2005-05-29 |648.46        |3495.64            
2005-05-30 |628.42        |4124.06            
2005-05-31 |700.37        |4824.43            
2005-06-14 |57.84         |4882.27            
...

Doing the same with multiplication

This is already quite useful. Very occasionally, however, we do not need to aggregate multiple values in a sum (through addition), but in a product (through multiplication). I’ve just stumbled upon such a case on Stack Overflow, recently. The question wanted to achieve the following result:
date        factor          accumulated
---------------------------------------
1986-01-10  null            1000
1986-01-13  -0.026595745    973.4042548
1986-01-14  0.005464481     978.7234036
1986-01-15  -0.016304348    962.7659569
1986-01-16  0               962.7659569
1986-01-17  0               962.7659569
1986-01-20  0               962.7659569
1986-01-21  0.005524862     968.0851061
1986-01-22  -0.005494506    962.765957
1986-01-23  0               962.765957
1986-01-24  -0.005524862    957.4468078
1986-01-27  0.005555556     962.7659569
1986-01-28  0               962.7659569
1986-01-29  0               962.7659569
1986-01-30  0               962.7659569
1986-01-31  0.027624309     989.3617013
1986-02-03  0.016129032     1005.319148
1986-02-04  0.042328041     1047.872338
1986-02-05  0.04568528      1095.744679
If this were a Microsoft Excel spreadsheet, the ACCUMULATED column would simply start with 1000 and have the following formula in all other rows:
accumulated(i) = accumulated(i - 1) * (1 + factor)
In other words (values truncated for simplicity):
1000.0 = start
 973.4 = 1000.0 * (1 - 0.026)
 978.7 =  973.4 * (1 + 0.005)
 962.7 =  978.7 * (1 - 0.016)
 962.7 =  962.7 * (1 - 0.000)
 962.7 =  962.7 * (1 - 0.000)
 962.7 =  962.7 * (1 - 0.000)
 968.0 =  962.7 * (1 + 0.005)
 ...
This is exciting because we’re not only requiring multiplicative aggregation, but even cumulative multiplicative aggregation. So, another window function. But regrettably, SQL doesn’t offer a MUL() aggregate function, even if it were relatively simple to implement. We have two options:
  • Implementing a custom aggregate function (stay tuned for a future blog post)
  • Using a trick by summing logarithms, rather than multiplying operands directly
We’re implementing the latter for now. Check out this cool Wikipedia website about logarithmic identities, which we are going to blindly trust. In the middle of it, we have:
bx * by = bx + y
Which leads to:
logb(x * y) = logb(x) + logb(y)
How cool is that? And thus:
x * y = blogb(x) + logb(y)
So, we can define any multiplication in terms of a bunch of exponentiation to some base (say e) and logarithms to some base (say e). Or, in SQL:
x * y = EXP(LN(x) + LN(y))
Or, as an aggregate function:
MUL(x) = EXP(SUM(LN(x)))
Heh! Our original problem can thus be solved very easily using this, as shown in my stack overflow answer:

SELECT
  date,
  factor,
  1000 * (EXP(SUM(LN(1 + COALESCE(factor, 1)))
       OVER (ORDER BY date)) - 1) AS accumulated
FROM t

And we get the nice result as previously shown. You may have to replace LN() by LOG() depending on your database.

Caveat: Negative numbers

Try running this:

SELECT LN(-1)

You’ll get:
SQL Error [2201E]: ERROR: cannot take logarithm of a negative number
Logarithms are defined only for strictly positive numbers, unless your database is capable of handling complex numbers as well. In case of which a single zero value would still break the aggregation. But if your data set is defined to contain only strictly positive numbers, you’ll be fine – give or take some floating point rounding errors. Or, you’ll do some sign handling, which looks like this:

WITH v(i) AS (VALUES (-2), (-3), (-4))
SELECT 
  CASE 
    WHEN SUM (CASE WHEN i < 0 THEN -1 END) % 2 < 0 
    THEN -1 
    ELSE 1 
  END * EXP(SUM(LN(ABS(i)))) multiplication1
FROM v;

WITH v(i) AS (VALUES (-2), (-3), (-4), (-5))
SELECT 
  CASE 
    WHEN SUM (CASE WHEN i < 0 THEN -1 END) % 2 < 0 
    THEN -1 
    ELSE 1 
  END * EXP(SUM(LN(ABS(i)))) multiplication2
FROM v;

The above yielding
multiplication1      
--------------------
-23.999999999999993 


multiplication2     
-------------------
119.99999999999997 
Close enough.

Caveat: Zero

Try running this:

SELECT LN(0)

You’ll get:
SQL Error [2201E]: ERROR: cannot take logarithm of zero
Zero is different from negative numbers. A product that has a zero operand is always zero, so we should be able to handle this. We’ll do it in two steps:
  • Exclude zero values from the actual aggregation that uses EXP() and LN()
  • Add an additional CASE expression that checks if any of the operands is zero
The first step might not be necessary depending on how your database optimiser executes the second step.

WITH v(i) AS (VALUES (2), (3), (0))
SELECT 
  CASE 
    WHEN SUM (CASE WHEN i = 0 THEN 1 END) > 0
    THEN 0
    WHEN SUM (CASE WHEN i < 0 THEN -1 END) % 2 < 0 
    THEN -1 
    ELSE 1 
  END * EXP(SUM(LN(ABS(NULLIF(i, 0))))) multiplication
FROM v;

Extension: DISTINCT

Calculating the product of all DISTINCT values requires to repeat the DISTINCT keyword in 2 out of the above 3 sums:

WITH v(i) AS (VALUES (2), (3), (3))
SELECT 
  CASE 
    WHEN SUM (CASE WHEN i = 0 THEN 1 END) > 0
    THEN 0
    WHEN SUM (DISTINCT CASE WHEN i < 0 THEN -1 END) % 2 < 0 
    THEN -1 
    ELSE 1 
  END * EXP(SUM(DISTINCT LN(ABS(NULLIF(i, 0))))) multiplication
FROM v;

The result is now:
multiplication |
---------------|
6              |
Notice that the first SUM() that checks for the presence of NULL values doesn’t require a DISTINCT keyword, so we omit it to improve performance.

Extension: Window functions

Of course, if we are able to emulate a PRODUCT() aggregate function, we’d love to turn it into a window function as well. This can be done simply by transforming each individual SUM() into a window function:

WITH v(i, j) AS (
  VALUES (1, 2), (2, -3), (3, 4), 
         (4, -5), (5, 0), (6, 0)
)
SELECT i, j, 
  CASE 
    WHEN SUM (CASE WHEN j = 0 THEN 1 END) 
      OVER (ORDER BY i) > 0
    THEN 0
    WHEN SUM (CASE WHEN j < 0 THEN -1 END) 
      OVER (ORDER BY i) % 2 < 0 
    THEN -1 
    ELSE 1 
  END * EXP(SUM(LN(ABS(NULLIF(j, 0)))) 
    OVER (ORDER BY i)) multiplication
FROM v;

The result is now:
i |j  |multiplication      |
--|---|--------------------|
1 | 2 |2                   |
2 |-3 |-6                  |
3 | 4 |-23.999999999999993 |
4 |-5 |119.99999999999997  |
5 | 0 |0                   |
6 | 1 |0                   |
So cool! The cumulative product gets bigger and bigger until it hits he first zero, from then on it stays zero.

jOOQ support

jOOQ 3.12 will support this as well and emulate it correctly on all databases: https://github.com/jOOQ/jOOQ/issues/5939

A note on Oracle performance

Do note that Oracle is very slow to calculate LN(number_type). It can be MUCH faster to calculate, instead LN(binary_double_type). An explicit type cast produced a 100x performance improvement in a simple test, documented here.

12 thoughts on “How to Write a Multiplication Aggregate Function in SQL

  1. I discussed this in detail in two of my books many years ago. Here are the references:

    Joe Celko’s SQL for Smarties: Advanced SQL Programming – Page 468
    https://books.google.com/books?isbn=012800830X

    Chapter 23.8.2 The PRD() Aggregate Function by Logarithms Roy Harvey, another SQL guru, found a different solution, which only someone old enough to remember slide rules and that we can multiply by …

    Joe Celko’s Thinking in Sets: Auxiliary, Temporal, and Virtual …
    https://books.google.com/books?isbn=008055752X

      1. Without having tried too hard, I couldn’t get PostgreSQL to avoid the LN(..) call with a zero argument using a CASE expression. How did you do it?

        UPDATE: Never mind. Of course, a NULLIF() call or similar expression prior to passing the argument to LN() will suffice. I’ve updated the blog post. Thanks for your pointers!

        1. Here is a “cut & paste” from one of my books:

          29.06.02. The PRD() Aggregate Function by Logarithms

          Roy Harvey, another SQL guru who answered questions on CompuServe, found a different solution, which only someone old enough to remember slide rules and that we can multiply by adding logs. The nice part of this solution is that you can also use the DISTINCT option in the SUM() function.

          But there are a lot of warnings about this approach. The Standard allows only natural logarithms shown as LN(), but you will see LOG10() for the logarithm base ten and perhaps LOG(, ) for a general logarithm function. Since the logarithm of zero or less is undefined, the Standard requires an exception to be raised. But some older SQL might return a zero or a NULL. Likewise, the Standard also defines the exponential function as EXP() as its inverse.

          The expression for the product of a column from logarithm and exponential functions is:

          SELECT ((EXP (SUM (LN (CASE WHEN nbr = 0.00
                                      THEN CAST (NULL AS FLOAT)
                                      ELSE ABS(nbr) END))))
            * (CASE WHEN MIN (ABS (nbr)) = 0.00
                    THEN 0.00
                    ELSE 1.00 END)
            * (CASE WHEN MOD (SUM (CASE WHEN SIGN(nbr) = -1
                                         THEN 1
                                         ELSE 0 END), 2) = 1
                     THEN -1.00
                     ELSE 1.00 END) AS big_pi
           FROM NumberTable;
          

          The nice part of this is that you can also use the SUM (DISTINCT ) option to get the equivalent of PRD (DISTINCT ).

          You should watch the data type of the column involved and use either integer 0 and 1 or decimal 0.00 and 1.00 as is appropriate in the CASE statements. It is worth studying the three CASE expressions that make up the terms of the Prod calculation.

          The first CASE expression is to ensure that all zeros and negative numbers are converted to a non-negative or NULL for the SUM() function, just in case your SQL raises an exception.

          The second CASE expression will return zero as the answer if there was a zero in the nbr column of any selected row. The MIN(ABS(nbr)) is a handy trick for detecting the existence of a zero in a list of both positive and negative numbers with an aggregate function.

          The third CASE expression will return minus one if there was an odd number of negative numbers in the nbr column. The innermost CASE expression uses a SIGN() function which returns + 1 for a positive number, -1 for a negative number and 0 for a zero. The SUM() counts the -1 results then the MOD() functions determines if the count was odd or even.

          I present this version of the query first because this is how I developed the answer. We can do a much better job with a little algebra and logic:

          SELECT CASE MIN (SIGN (nbr))
                 WHEN 1 THEN EXP (SUM (LN (nbr))) -- all positive numbers
                 WHEN 0 THEN 0.00                 -- some zeros
                 WHEN -1                          -- some negative numbers
                 THEN (EXP (SUM (LN (ABS(nbr))))
                       * (CASE WHEN
                               MOD (SUM (ABS (SIGN(nbr)-1/ 2)), 2) = 1
                               THEN -1.00 ELSE 1.00 END))
                 ELSE CAST (NULL AS FLOAT) END AS big_pi
           FROM NumberTable;
          

          The idea is that there are three special cases – all positive numbers, one or more zeros, and some negative numbers in the set. You can find out what your situation is with a quick test on the SIGN() of the minimum value in the set.

          In the case where you have negative numbers, there are two sub-cases: (1) an even number of negatives or (2) an odd number of negatives. You then need to apply some High School algebra to determine the sign of the final result.

          Itzak Ben-Gan had problems in implementing this in an older version of SQL Server which is worth passing along in case your SQL product also has them. The query as written returns a domain error in SQL Server, even though it should not have the result expressions in the CASE expression been evaluated after the conditional flow had performed a short-circuit evaluation. Examining the execution plan of the above query, it looks like the optimizer evaluates all of the possible result expressions in a step prior to handling the flow of the CASE expression.

          This means that in the expression after WHEN 1 … the LN() function is also invoked in an intermediate phase for zeros and negative numbers, and in the expression after WHEN -1 … the LN(ABS()) is also invoked in an intermediate phase for zeroes. This explains the domain error.

          To handle this, I had to use the ABS() and NULLIF() functions in the positive numbers WHEN clause, and the NULLIF() function in the negative numbers WHEN clause:

          ...
             WHEN 1 THEN EXP(SUM(LN(ABS(NULLIF(result, 0.00)))))
          and
             ...
             WHEN -1
             THEN EXP(SUM(LN(ABS(NULLIF(result, 0.00)))))
                    * CASE ...
          

          If you are sure that you will have only positive values in the column being computed, then you can use

          PRD() = EXP(SUM(LN ()))
          

          As an aside, the book BYPASSES: A SIMPLE APPROACH TO COMPLEXITY by Z. A. Melzak (Wiley-Interscience, 1983, ISBN 0-471-86854-X), is a short mathematical book on the general principle of conjugacy. This is the method of using a transform and its inverse to reduce the complexity of a calculation.

  2. I’m having trouble figuring out how to calculate the daily compound interest for an initial amount, over various rates periods, producing a new total that includes the interest amounts from each rate period. The challenge is that for each subsequent rate period you have to calculate the interest on the amount plus the previous interest!!! So it’s not a simple running total.

    For example, using the following rates table.

    rate from date rate to date rate
    2013-07-15 2013-09-30 3
    2013-10-01 2013-12-31 4
    2014-01-01 2014-03-31 3

    Using an initial amount of $32,550.37, I have to traverse each rate period with an interest calculation, producing the final amount of $33,337.34.

    rate from rate to rate daysx amount interest
    2013-07-15 2013-09-30 .03 78 32,550.37 209.34
    2013-10-01 2013-12-31 .04 92 32,759.71 331.94
    2014-01-01 2014-03-31 .03 90 33,091.65 245.69
    Final Amount 33,337.34

    For example, the initial amount of $32,550.37 has interest of $209.34 at 3%. For the second rate period, I add that interest to the amount, which is $32,759.71 and then calculate the interest on $32,759.71 at 4%. Etc.

    I’m using Netezza which does not allow recursive SQL, so I have been trying to use windowed functions, but not with any success yet …

    DROP TABLE TRATES;
    CREATE TABLE TRATES (RATE_FROM_DATE DATE, RATE_TO_DATE DATE, RATE DECIMAL(10,2));
    INSERT INTO TRATES VALUES (‘2013-07-15′,’2013-09-30’,.03);
    INSERT INTO TRATES VALUES (‘2013-10-01′,’2013-12-31’,.04);
    INSERT INTO TRATES VALUES (‘2014-01-01′,’2014-03-31’,.03);

    SELECT TRATES.*
    , DAYS_BETWEEN(RATE_FROM_DATE, RATE_TO_DATE)+1 AS DAYSX
    , (AMOUNT * POW(1+(RATE)/365,(DAYS_BETWEEN(RATE_FROM_DATE, RATE_TO_DATE)+1)))) – AMOUNT
    AS INTEREST
    , FIRST_VALUE(AMOUNT) OVER(ORDER BY RATE_FROM_DATE)
    *(POW(1+(RATE/100)/365,(DAYS_BETWEEN(RATE_FROM_DATE, RATE_TO_DATE)+1)))
    AS NEW_AMOUNT
    FROM TRATES
    JOIN (SELECT 32550.37 AS AMOUNT) AS TPARMS ON 1=1
    ;
    Any help would be greatly appreciated.

  3. Hi Lukas,
    Do you know if there is an equivalent of EVAL generic function for SQL or at any dialect you worked with?

    My idea to solve it was to generate a multiplication string(by using any STRING_AGG function) and then execute it like in provided demo: https://dbfiddle.uk/?rdbms=oracle_18&fiddle=76f6598b55efb103f979a52100831bd6
    This also allow to solve different functions as long as we are able to generate expression string(though the performance may be poor).

    Full story: https://stackoverflow.com/a/66110206/5070879

    1. Well, you’re going to use EXECUTE IMMEDIATE (or EXECUTE or EXECUTE STATEMENT or whatever your dialect calls it) inside of a stored function. Or, alternatively, since you’ve already made that XMLTABLE example, you might be able to pull it off using XQUERY or XPath.

      I’m obviously going to schedule a blog post about this 😁

Leave a Reply