Faster SQL Pagination with Keysets, Continued

A while ago, I have blogged about how to perform keyset pagination (some also call this the “seek method”). Keyset pagination is a very powerful technique to perform constant-time pagination also on very large result sets, where “classic” OFFSET pagination will inevitably get slow on large page numbers. Keyset pagination is most useful for lazy loading the “next page”, much more than for jumping to page 235 directly, for instance. A good example where this is useful are social media, such as Twitter or Facebook with their streams of content. When you reach the bottom of a page, you just want the “next couple of tweets”. Not the tweets at offset 3564. If your data being paginated on stays “reasonably unchanged” while paginating, you can even perform classical OFFSET pagination by pre-calculating (and caching) all page boundaries in one go. Imagine, you have this data (I’m using PostgreSQL for the example):

create table t(
  id int,
  value int
);

insert into t (id, value)
select id, random() * 100
from generate_series(1, 1000) x(id);

Our data has 1000 records with random values between 0 and 99. Now, let’s browse this data in ascending order, ordered by VALUE and then by ID. If we wanted to get to page 6 with page sizes of 5 with OFFSET pagination, we’d write:

select id, value
from t
order by value, id
limit 5
offset 25

This would then yield something like
|  ID | VALUE |
|-----|-------|
| 640 |     2 |
| 776 |     2 |
| 815 |     2 |
| 947 |     2 |
|  37 |     3 |
See also this SQLFiddle

OFFSET pagination emulation with cached page boundaries

The above can be emulated using keyset pagination if we know the page boundaries of every page. In other words, in order to jump to page 6 without an actual OFFSET, we’d have to know the value of the record immediately preceding {"ID":640,"VALUE":2}. Or better, let’s just figure out all the page boundaries with the following query:

select 
  t.id, 
  t.value,
  case row_number() 
       over(order by t.value, t.id) % 5 
    when 0 then 1 
    else 0 
  end page_boundary
from t
order by t.value, t.id

The above query yields
|   ID | VALUE | PAGE_BOUNDARY |
|------|-------|---------------|
|  ... |   ... |           ... |
|  474 |     2 |             0 |
|  533 |     2 |             1 | <-- Before page 6
|  640 |     2 |             0 |
|  776 |     2 |             0 |
|  815 |     2 |             0 |
|  947 |     2 |             0 |
|   37 |     3 |             1 | <-- Last on page 6
|  287 |     3 |             0 |
|  450 |     3 |             0 |
|  ... |   ... |           ... |
See also this SQL Fiddle As you can see, the record just before {"ID":640,"VALUE":2} is {"ID":533,"VALUE":2}, which is the page boundary that we need to jump to if we want to go to page 6. Page 7 then starts with the record just after {"ID":37,"VALUE":3}. The above query selects too much data, of course. We’re only interested in those records where PAGE_BOUNDARY = 1. Besides, why not calculate the page numbers already in the database with yet another call to ROW_NUMBER(). Let’s write:

select 
  x.value, 
  x.id,
  row_number() 
    over(order by x.value, x.id) + 1 page_number
from (
  select 
    t.id, 
    t.value,
    case row_number() 
         over(order by t.value, t.id) % 5 
      when 0 then 1 
      else 0 
    end page_boundary
  from t
  order by t.value, t.id
) x
where x.page_boundary = 1

This will then yield:
| VALUE |  ID | PAGE_NUMBER |
|-------|-----|-------------|
|     0 | 786 |           2 |
|     1 | 250 |           3 |
|     1 | 959 |           4 |
|     2 | 229 |           5 |
|     2 | 533 |           6 | <-- Before page 6
|     3 |  37 |           7 |
|     3 | 768 |           8 |
See also this SQLFiddle. We can now jump to page 6 with this simple query:

select id, value
from t
where (value, id) > (2, 533)
order by value, id
limit 5

… which will yield the same as the previous OFFSET query:
|  ID | VALUE |
|-----|-------|
| 640 |     2 |
| 776 |     2 |
| 815 |     2 |
| 947 |     2 |
|  37 |     3 |
See also this SQLFiddle If you’re planning on using the upcoming jOOQ 3.3, the same query can be achieved with the following SEEK syntax:

DSL.using(configuration)
   .select(T.ID, T.VALUE)
   .from(T)
   .orderBy(T.VALUE, T.ID)
   .seek(2, 533)
   .limit(5);

The advantage of this is that you don’t have to write out the SEEK predicate explicitly, which adds readability and typesafety, specifically if your ORDER BY clause is a little more complex

If window functions aren’t available

The above queries make use of window functions, which aren’t available in all databases, unfortunately. If you’re using MySQL, for instance, you will have to use a different mechanism to find the PAGE_BOUNDARY. One such example us using a scalar subquery:

select 
  t.id, 
  t.value,
  case (
      select count(*)
      from t t2
      where (t2.value, t2.id) <= (t.value, t.id)
    ) % 5 
    when 0 then 1 
    else 0 
  end page_boundary
from t
order by t.value, t.id

See also this SQLFiddle Such a scalar subquery might be quite costly if your database performs poor query optimisation. Your best bet would be to measure things and decide whether caching page boundaries to be able to apply keyset pagination is really faster than classic OFFSET paging.

Conclusion

As explained in our previous blog post about keyset pagination, this technique can bring great performance improvements as pagination can be achieved in constant time, leveraging existing indexes. It is most useful when the underlying data is very stable (no records added / removed while paginating), or when pagination “stability” is desired even if records are added / removed. Keyset pagination, or the “seek method”, should be part of every SQL developer’s tool set.

8 thoughts on “Faster SQL Pagination with Keysets, Continued

  1. Joe Celko should add this post to the next edition of the “SQL for Smarties” :)
    Great article!

  2. > Do you know him, personally

    Unfortunately, no. But his publications is really inspiring, and your article is on par with the most interesting tricks described there

  3. Thanks very much Lukas!

    So basically the keyset is (desired sort columns + ID column) ?

    Is it possible to leave out the ID entirely?

    And what if the primary key is composite?

    1. So basically the keyset is (desired sort columns + ID column) ?

      Pretty much, or more precisely:

      • The desired sort columns
      • any additional columns to make tuples in the keyset unique, if they aren’t already (e.g. the ID column)

      So it doesn’t really matter if you have composite primary keys. All you need is a unique, unambiguous sorting criteria. Note that this is not strictly related to keyset paging. You should always have an unique, unambiguous sorting criteria to get deterministic results.

  4. If you don’t use the unique key in your order by directly (through selection of the UI for example) then just add to the end of the order by clause (asc or desc depending on your use) so that you get deterministic results. Then just apply the other ideas for previous and next page…I can’t see why anyone would want to use rowset/rowno algorithms after using keysetting!! I know i am opening up a can of worms by saying that but I have always found out through experiences that your small tables that you didn’t think would get all that big >>all of a sudden get much bigger<< and then you end up going in and writing something like the keyset anyway…get it in there through an API that works with all SQL and then you have it at your fingertips…you can always abstract the API with interfaces depending on the SQL vendor (you can even abstract it out depending on what data language you are using :)

    1. Why write the API when you can use jOOQ? :-)

      Well, offset pagination is simply easier to implement, because in the meantime, almost all SQL vendors support it (DB2 still doesn’t, except when emulating it using window functions). So it might come to mind more quickly. I really wish SQL had a SEEK clause:

      ORDER BY a, b DESC, c
      SEEK :valueForA, :valueForB, :valueForC
      FETCH NEXT 10 ROWS ONLY
      

      Where the degree and data types of the SEEK clause columns must match those of the ORDER BY clause.

Leave a Reply to ccsaxtonCancel reply