CUME_DIST(), a Lesser-Known SQL Gem


When doing reporting or statistics with SQL, you better know your window functions. There are many of them, and few SQL developers know about them.

CUME_DIST() is one such function. We’ve recently re-discovered it on Stack Overflow. The following query yields two times the same result for fraction1 and fraction2:

SELECT 
  ename, 
  CUME_DIST() OVER (ORDER BY ename) fraction1,
  ROWNUM / (MAX(ROWNUM) OVER()) fraction2
FROM emp
ORDER BY ename

The above query then yields:

|  ENAME | FRACTION1 | FRACTION2 |
|--------|-----------|-----------|
|  ALLEN |      0.08 |      0.08 |
|  BLAKE |      0.17 |      0.17 |
|  CLARK |      0.25 |      0.25 |
|   FORD |      0.33 |      0.33 |
|  JAMES |      0.42 |      0.42 |
|  JONES |       0.5 |       0.5 |
|   KING |      0.58 |      0.58 |
| MARTIN |      0.67 |      0.67 |
| MILLER |      0.75 |      0.75 |
|  SMITH |      0.83 |      0.83 |
| TURNER |      0.92 |      0.92 |
|   WARD |         1 |         1 |

… as can be seen in this SQLFiddle. In plain English, the CUME_DIST() (or cumulative distribution) of a value within a group of values helps you see how far “advanced” a value is in the ordering of the whole result set – or of a partition thereof.

The second expression using ROWNUM informally explains this with an equivalent expression. The value is always strictly greater than zero and smaller or equal to 1:

0 < CUME_DIST() OVER(ORDER BY ename) <= 1

Note that Oracle (and the SQL standard) also support CUME_DIST() as an “ordered aggregate function”, “ordered set function” or “hypothetical set function”:

SELECT 
  ename, 
  CUME_DIST(ename)
    WITHIN GROUP (ORDER BY ename) fraction
FROM emp
GROUP BY ename
ORDER BY ename

The standard specifies the above as:

<hypothetical set function> ::=
    <rank function type> <left paren>
        <hypothetical set function value expression list> 
            <right paren>
        <within group specification>

<within group specification> ::=
    WITHIN GROUP <left paren> 
        ORDER BY <sort specification list> 
            <right paren>

jOOQ also supports the cumeDist() window function, and the upcoming jOOQ 3.4 will also support the ordered aggregate function.

… and you, you should definitely make this nice function a part of your SQL vocabulary.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s