SQL Trick: row_number() is to SELECT what dense_rank() is to SELECT DISTINCT

The SQL:2003 standard ranking functions are awesome companions and useful tools every now and then. The ones that are supported in almost all databases are:

ROW_NUMBER(): This one generates a new row number for every row, regardless of duplicates within a partition.
RANK(): This one generates a new row number for every distinct row, leaving gaps between groups of duplicates within a partition.
DENSE_RANK(): This one generates a new row number for every distinct row, leaving no gaps between groups of duplicates within a partition.

As always, the above is much easier to understand by example. Let’s assume the following PostgreSQL schema containing a table with 8 records, some of which are duplicates:


CREATE TABLE t AS
SELECT 'a' v UNION ALL
SELECT 'a'   UNION ALL
SELECT 'a'   UNION ALL
SELECT 'b'   UNION ALL
SELECT 'c'   UNION ALL
SELECT 'c'   UNION ALL
SELECT 'd'   UNION ALL
SELECT 'e'

Now, let’s select each value, along with the three aforementioned ranking functions. And for kicks and giggles, we’ll use the SQL standard WINDOW clause! Yay, it saved us 15 characters of repetitive SQL code. Note that the WINDOW clause is hardly implemented apart by PostgreSQL and Sybase SQL Anywhere…


SELECT
  v,
  ROW_NUMBER() OVER (window) row_number,
  RANK()       OVER (window) rank,
  DENSE_RANK() OVER (window) dense_rank
FROM t
WINDOW window AS (ORDER BY v)
ORDER BY v

And the above results in:

+---+------------+------+------------+
| V | ROW_NUMBER | RANK | DENSE_RANK |
+---+------------+------+------------+
| a |          1 |    1 |          1 |
| a |          2 |    1 |          1 |
| a |          3 |    1 |          1 |
| b |          4 |    4 |          2 |
| c |          5 |    5 |          3 |
| c |          6 |    5 |          3 |
| d |          7 |    7 |          4 |
| e |          8 |    8 |          5 |
+---+------------+------+------------+

^{(See also this SQLFiddle)}

How DENSE_RANK() can help when writing SELECT DISTINCT

No doubt, ROW_NUMBER() is the most useful ranking function among the above, specifically when you need to emulate LIMIT .. OFFSET clauses as in DB2, Oracle (11g or less), Sybase SQL Anywhere (prior to version 12), SQL Server (2008 and less). Read on here about how jOOQ emulates this SQL clause in various SQL dialects.

jOOQ is the best way to write SQL in Java

But using ROW_NUMBER() has a subtle problem when used along with DISTINCT or UNION. It prevents the database from being able to remove duplicates, because ROW_NUMBER will always produce distinct values within a partition. In the above example, duplicate values for T.V were added on purpose. How can we first remove the duplicates and only then enumerate row numbers? Clearly, we can no longer use ROW_NUMBER(). The following query:


SELECT DISTINCT
  v, 
  ROW_NUMBER() OVER (window) row_number
FROM t
WINDOW window AS (ORDER BY v)
ORDER BY v, row_number

… yields

+---+------------+
| V | ROW_NUMBER |
+---+------------+
| a |          1 |
| a |          2 |
| a |          3 |
| b |          4 |
| c |          5 |
| c |          6 |
| d |          7 |
| e |          8 |
+---+------------+

^{(See also this SQLFiddle)} But we can use DENSE_RANK(), instead! With DENSE_RANK() the ranking is applied in a way that duplicate records will receive the same rank. And there are no gaps between ranks. Hence:


SELECT DISTINCT
  v, 
  DENSE_RANK() OVER (window) row_number
FROM t
WINDOW window AS (ORDER BY v)
ORDER BY v, row_number

… which yields:

+---+------------+
| V | ROW_NUMBER |
+---+------------+
| a |          1 |
| b |          2 |
| c |          3 |
| d |          4 |
| e |          5 |
+---+------------+

^{(See also this SQLFiddle)}

Thus, remember…

Thus, remember: ROW_NUMBER() is to SELECT what DENSE_RANK() is to SELECT DISTINCT

Caveats

In order for the above to be true, however, you must ensure that all expressions from the SELECT DISTINCT clause are used in the DENSE_RANK()‘s OVER(ORDER BY ...) clause. For example:


SELECT DISTINCT
  v1,
  v2,
  v3,
  DENSE_RANK() OVER (window) row_number
FROM t
WINDOW window AS (ORDER BY v1, v2, v3)

If any of v1, v2, v3 are other ranking functions or aggregate functions, or non-deterministic expressions, etc., the above trick won’t work. But it’s still a nice trick to keep up one’s sleeves for the odd corner-case query, where distinct rows need row numbers. And if you don’t want to think about this trick all the time, do note that we’ve baked this into jOOQ, because we think that you should worry about business logic, not about SQL standardiation:

15 thoughts on “SQL Trick: row_number() is to SELECT what dense_rank() is to SELECT DISTINCT”

I have a input like

Movie name 	played_time
interstellar	11/15/2014 12:00:00
interstellar	11/14/2014 12:00:00
interstellar	11/13/2014 12:00:00
interstellar	11/12/2014 12:00:00
interstellar	11/11/2014 12:00:00
interstellar	11/10/2014 12:00:00
batsman		10/01/2014 12:00:00
batsman		11/02/2014 12:00:00
batsman		11/03/2014 12:00:00
batsman		11/04/2014 12:00:00
spiderman	09/04/2014 12:00:00
superman	10/25/2014 12:00:00	
superman	10/24/2014 12:00:00
superman	10/22/2014 12:00:00
superman	10/21/2014 12:00:00
superman	10/11/2014 12:00:00
titanic		11/05/2014 12:00:00

i need to pick the most top 3 played movie , i.e,

interstellar
superman
batsman

please help

lukaseder says:

December 2, 2014 at 07:44
That’s not really a query where you need window functions. It can be solved easily with SQL-92 aggregate functions, which run on any database:
```
SELECT movie_name
FROM my_table
GROUP BY movie_name
ORDER BY count(*) DESC
LIMIT 3 -- I'm assuming you're using MySQL or PostgreSQL here
```
Note that a great platform to ask these questions is Stack Overflow

Loading...
Reply
ravi says:

April 5, 2016 at 09:04

select top 3 count(movie_name) total, movie_name from movies
group by movie_name
order by count(movie_name) desc;

Loading...

Reply

with cteMovie as
(
select movie_name, played_time, row_number() over (partition by movie_name order by played_time desc ) as number from my_table
)
select * from cteMovie where number=1

lukaseder says:

January 28, 2015 at 10:57

Thank you for your contribution. Yes, that is how you can filter by such a row number. In what context did you think this fits the article at hand?

Loading...

Reply

How to use the “where” function for row_number

ercan says:

July 8, 2015 at 08:44

for example:
row-number > 3

Loading...

Reply
lukaseder says:

July 13, 2015 at 16:31
You’ll have to calculate the window function value in a derived table:
```
SELECT *
FROM (
    SELECT row_number() OVER ( ... ) AS rn
    FROM ...
) t
WHERE t.rn > 3
```
Loading...
Reply

can any one please post the query to find the 10th highest record in HIVE.

lukaseder says:

November 23, 2015 at 12:28

Sounds like an excellent question for Stack Overflow! https://stackoverflow.com

Loading...

Reply

We are sybase database, dont have DENSE_RANK, PARTITION BY functionality. How to achieve the DENSE_RANK() OVER (PARTITION BY with out using In-built functions.

lukaseder says:

July 3, 2017 at 12:30

You migrate off Sybase ;) Cheers

Loading...

Reply
lukaseder says:

July 3, 2017 at 12:39
On a more serious note: DENSE_RANK() can be emulated with an ordinary subquery.
```
DENSE_RANK() OVER (PARTITION BY x ORDER BY y) 
```
is the same as:
```
SELECT 
  o.*, (
    SELECT COUNT(DISTINCT y) 
    FROM t AS i 
    WHERE i.x = o.x
    AND i.y < o.y
  ) AS dense_rank
FROM t AS o
```
Chances are, that this query is quite slow, though.

Loading...
Reply
1. Bhushan says:
  
  July 25, 2017 at 07:05
  
  i want to generate rank order by 2 columns.Is there is any solution for that
  
  Loading...
  
  Reply
  1. lukaseder says:
    
    July 25, 2017 at 09:05
    
    Just RANK() OVER (ORDER BY col1, col2)
    
    Loading...
    
    Reply

SQL Trick: row_number() is to SELECT what dense_rank() is to SELECT DISTINCT

How DENSE_RANK() can help when writing SELECT DISTINCT

Thus, remember…

Caveats

Like this:

Published by lukaseder

15 thoughts on “SQL Trick: row_number() is to SELECT what dense_rank() is to SELECT DISTINCT”

Leave a Reply to lukasederCancel reply

How DENSE_RANK() can help when writing SELECT DISTINCT

Thus, remember…

Caveats

Like this:

Published by lukaseder

15 thoughts on “SQL Trick: row_number() is to SELECT what dense_rank() is to SELECT DISTINCT”

Leave a Reply to lukasederCancel reply

Discover more from Java, SQL and jOOQ.