Why Most Programmers Get Pagination Wrong


Pagination is one of those things that almost everyone gets wrong for two reasons:

  • User experience
  • Database performance

Here’s why.

What’s wrong with pagination?

Most applications blindly produce pagination like this:

pagination

This is how GMail implements pagination. With my current settings, it displays 100 E-Mails at a time and also shows how many E-Mails there are in total, namely 1094. Those aren’t the total number of E-Mails I’ve ever had, they’re the total number of E-Mails in my “blog-idea” label (I’m using GMail as a TODO list, and yes, this blog won’t run out of articles any time soon).

What’s wrong with this practice?

Bad user experience

As a user, in most cases, I don’t really care about the total number of objects that are in my result set. At least, I don’t care about the exact number. Does it make any difference if I have 1094 or 1093 E-Mails? What about if I had 1067? Or 1000? 1000 would be precise enough for what I’m concerned.

Also, as a user, in most cases, I don’t care that I’m on page 317 of my paginated screen that displays me rows 3170-3179 (assuming 10 rows per page). I really don’t. The page number is absolutely useless in terms of user experience.

Who got it right?

  • Facebook
  • Twitter
  • Reddit

And all the other websites that do timelines. Yes, I want to display only 10 rows at a time (or perhaps 100, but almost never all). So, pagination is important. But I don’t care about the fact that I’ve clicked 317 times on that “next page” button. If I ever browse that many pages (and I hardly ever do), then the only thing that matters is the next 10 rows. Just like when you play Civilization. You don’t care that you’re in turn 317. You just want to play one more turn:

4b665f86cbb10d44e2db6ae4c96fef4050f0ce42878015ab30cf681b84537a30[1]

Moreover, I never ever ever want to jump to page 317 right from the beginning. There’s absolutely no use case out there, where I search for something, and then I say, hey, I believe my search result will be item #3175 in the current sort order. Exactly. Instead, I will do any of these:

  • Refine the search
  • Sort the result

In both cases, I will get a result where the record that I’m looking for is much more likely to appear on page #1 or perhaps #2. Again, when was the last time you googled for SQL and then went to page #18375 to find that particular blog post that you were looking for? No. You searched for “Java 8 SQL” to find jOOQ, the best way to write SQL in Java 8. For instance.

How to implement a timeline with SQL

If your data source is a SQL database, you might have implemented pagination by using LIMIT .. OFFSET, or OFFSET .. FETCH or some ROWNUM / ROW_NUMBER() filtering (see the jOOQ manual for some syntax comparisons across RDBMS). OFFSET is the right tool to jump to page 317, but remember, no one really wants to jump to that page, and besides, OFFSET just skips a fixed number of rows. If there are new rows in the system between the time page number 316 is displayed to a user and when the user skips to page number 317, the rows will shift, because the offsets will shift. No one wants that either, when they click on “next”.

Instead, you should be using what we refer to as “keyset pagination” (as opposed to “offset pagination”). We’ve described this in past articles here:

The SQL syntax is a bit cumbersome as the pagination criteria becomes an ordinary predicate, but if you’re using jOOQ, you can use the simple synthetic SQL SEEK clause as such:

DSL.using(configuration)
   .select(PLAYERS.PLAYER_ID,
           PLAYERS.FIRST_NAME,
           PLAYERS.LAST_NAME,
           PLAYERS.SCORE)
   .from(PLAYERS)
   .where(PLAYERS.GAME_ID.eq(42))
   .orderBy(PLAYERS.SCORE.desc(),
            PLAYERS.PLAYER_ID.asc())
   .seek(949, 15)
   .limit(10)
   .fetch();

The above will fetch the next 10 players after the player with SCORE 949 and ID 15. The pagination really depends on the ORDER BY clause, which is why you have to provide as many values in the pagination as you provided columns in the ORDER BY clause.

Now, that we’ve fixed the user experience let’s also look at …

How OFFSET pagination is bad for performance

The previously linked articles about keyset pagination also mention the poor performance of OFFSET pagination. Which is kind of obvious as OFFSET has to skip a given number of rows after applying all predicates and sorting, etc. So the database has to do all the work and then throw away 3170 records (if you’re jumping to page 317 with 10 rows per page). What a waste.

The following diagram shows very nicely how OFFSET gets slower and slower for large offsets:

Reproduced from use-the-index-luke.com with permission by Markus Winand

That’s the obvious problem, but there’s another one. People always count the total number of rows to calculate the total number of possible pages. Why? To display nonsense like the following:

Page number:
1 2 3 ... 315 316 317 318 319 ... 50193 50194

Wow. OK so we’re on page number 317, which we don’t really care about in the first place, but we could just as well jump to page number 50194. This means that the database needed to run the query across all the rows just to be sure we get exactly 50194 pages in total.

Google something like page number pagination and observe the number of tutorials that show how you can implement the above nonsense. On Google Image search, you’ll find:

pagination-google

At the same time, the Google search itself reveals:

pagination-google-search

As you can see, Google estimates that there are probably at least 10 pages for your search and you can go “next”. Yes, you can skip some pages, but you cannot skip to a page number 50194, because, again:

  • No one wants that
  • It’s costly to predict, even for Google

In fact, Google search implements keyset pagination as well, just like Twitter, Facebook, Reddit. And they don’t display the total number of pages because counting that total can be very costly, depending on your database.

In particular, databases that do not support window functions will require you to run two separate queries:

  1. The actual query with a LIMIT clause
  2. An additional query replacing the SELECT column list with a simple COUNT(*)

Needless to say that this is not the best approach. If your database supports window functions (read about that miraculous SQL feature here on the jOOQ blog), you could produce the total row count in one go as such:

SELECT 
  rental_date, 
  inventory_id,
  COUNT(*) OVER()
FROM rental
WHERE customer_id = 1
ORDER BY rental_date
LIMIT 10

That COUNT(*) OVER() window function is like an ordinary aggregate function, except that it doesn’t group your results. It just counts all the rows of your result and produces that count in each row, prior to limiting the result to 10.

When run against the Sakila database, the above produces:

rental_date          inventory_id  count
2005-05-25 11:30:37          3021     32
2005-05-28 10:35:23          4020     32
2005-06-15 00:54:12          2785     32
2005-06-15 18:02:53          1021     32
2005-06-15 21:08:46          1407     32
2005-06-16 15:18:57           726     32
2005-06-18 08:41:48           197     32
2005-06-18 13:33:59          3497     32
2005-06-21 06:24:45          4566     32
2005-07-08 03:17:05          1443     32

So, we’re displaying the first page with 10 rows and we need to provide navigational links for a total of 4 pages because we have a total of 32 rows.

What happens when we benchmark this query on PostgreSQL? The first run doesn’t calculate this COUNT(*) OVER() value, whereas the second one does:

DO $$
DECLARE
  v_ts TIMESTAMP;
  v_repeat CONSTANT INT := 10000;
  rec RECORD;
BEGIN
  v_ts := clock_timestamp();

  FOR i IN 1..v_repeat LOOP
    FOR rec IN (
      SELECT 
        rental_date, 
        inventory_id
      FROM rental
      WHERE customer_id = 1
      ORDER BY rental_date
      LIMIT 10
    ) LOOP
      NULL;
    END LOOP;
  END LOOP;

  RAISE INFO 'Statement 1: %', (clock_timestamp() - v_ts); 
  v_ts := clock_timestamp();

  FOR i IN 1..v_repeat LOOP
    FOR rec IN (
      SELECT 
        rental_date, 
        inventory_id,
        COUNT(*) OVER()
      FROM rental
      WHERE customer_id = 1
      ORDER BY rental_date
      LIMIT 10
    ) LOOP
      NULL;
    END LOOP;
  END LOOP;

  RAISE INFO 'Statement 2: %', (clock_timestamp() - v_ts); 
END$$;

The result clearly indicates that in PostgreSQL, there’s a significant overhead in calculating this value:

INFO:  Statement 1: 00:00:01.041823
INFO:  Statement 2: 00:00:03.57145

Oracle optimises things a bit better when you’re using ROWNUM to paginate:

SET SERVEROUTPUT ON
DECLARE
  v_ts TIMESTAMP;
  v_repeat CONSTANT NUMBER := 5000;
BEGIN
  v_ts := SYSTIMESTAMP;
     
  FOR i IN 1..v_repeat LOOP
    FOR rec IN (
      SELECT 
        rental_date, 
        inventory_id
      FROM (
        SELECT 
          rental.*, 
          ROWNUM rn
        FROM rental
        WHERE customer_id = 1
        ORDER BY rental_date
      ) rental
      WHERE rn < 5
      ORDER BY rn
    ) LOOP
      NULL;
    END LOOP;
  END LOOP;
     
  dbms_output.put_line('Statement 1: ' || (SYSTIMESTAMP - v_ts));
  v_ts := SYSTIMESTAMP;
     
  FOR i IN 1..v_repeat LOOP
    FOR rec IN (
      SELECT 
        rental_date, 
        inventory_id,
        COUNT(*) OVER()
      FROM (
        SELECT 
          rental.*,  
          ROWNUM rn
        FROM rental
        WHERE customer_id = 1
        ORDER BY rental_date
      ) rental
      WHERE rn < 5
      ORDER BY rn
    ) LOOP
      NULL;
    END LOOP;
  END LOOP;
     
  dbms_output.put_line('Statement 2: ' || (SYSTIMESTAMP - v_ts));
END;
/

Result:

Statement 1: X
Statement 2: X +/- 1%

So, the COUNT(*) seems to be calculated “for free”. Bonus question: Why is that?

Due to Oracle license restrictions, we cannot publish benchmark results here, comparing Oracle with PostgreSQL, sorry, but you can run the above code yourself against the Sakila database:
https://github.com/jOOQ/jOOQ/tree/master/jOOQ-examples/Sakila

Conclusion

TL;DR: OFFSET pagination bad. Keyset pagination good.

no-offset-banner-468x60.white

If you want to paginate in your application, please make sure whether you really, really, really need:

  • Exact page number
  • High page numbers
  • The last page number
  • The total number of rows

Because if you don’t (and in 98% of all UIs, you really don’t), then you can drastically speed up your queries while providing your users a much better experience. If that’s not a win-win situation worth thinking about…?

And don’t forget, jOOQ ships with native keyset pagination support!

4 thoughts on “Why Most Programmers Get Pagination Wrong

  1. Nicely written advocacy for keyset pagination. But i don’t agree with the timeline is always the better kind of ux. Search and refine search is fine “if you know what you want to find”, but totally does not help you if you are browsing/exploring the search space.

    And if you do that, you usually want to know how large the space is (approximately), to decide if you need to refine search first or just click through the pages. It also helps if you happen to be stuck with SEO wastelands at the top of your search, because you only have bad, low quality search terms at your disposal. Often helps to skip the first few dozend pages in such cases, instead of clicking next a gazillion times (or even worse, continuous scrolling).

  2. I agree with some arguments like performance, bad UI (and not UX). But i disagree with most of your article.

    Bad UI (not UX): Yes, pagination is not pretty, but it’s not the purpose. Pagination is useful. Using a timeline with autoload or one button is prettier and easy to use. No questions, you just have one link to click. Easy. But removing it does not benefits user experience.

    When you say we should refine search or sort it if we don’t find something. Sure, we could, but it’s your idea, not the way every people want to go. Plus, depending of how it’s stored, what you want to search and how many data your database has, your search, even detailed, can return a huge number of results.

    Here is examples where pagination with page number help a lot against a timeline:
    – Every sites with a list of things (shop, movies, books etc). If i have no idea or an imprecise idea of what i want, i just want to see a lot of them. So site display x items on the page and i go to many of the pages until i find the one i want. In case of connection issue, if i want to go back later or if i want to share the page, i want to be able to go to this page which show me the product or the products i saw sooner.
    – It’s very frustrating to know there is something you see before, but you have to scroll during 5 minutes to get it back because there is no pagination and no search page.

    From performance point of view, i agree with you for server side. But huge timeline is very bad for browser. More you display “pages”, more your browser has markup to deal with.

    You can use timeline if you always have pages with few results, or at least if your search page and facets are exhaustive.
    There is no better solution for all cases. Better solution depends of a lot of things, especially how the site is built and how to help user to find what he wants.

    • How does an easier UI not benefit user experience? I find Google incredibly more easy to use than, say, the ordinary ERP system. Let me illustrate this:

      UI = UX

      Your examples are very good examples for keyset pagination, not offset pagination. Do the exercise and think through them. You say you want to go back to a particular page much later. With offset pagination, you’re going back to record 3700 of a stored search. If there are new products that were added in the meantime, record 3700 is no longer the one you’ve had in your previous search. If you’re using keyset pagination, then you get the exact same result.

      Also, do think about keyset pagination more thoroughly. Just because I used the timeline example doesn’t mean you can’t scroll back with keysets. You can, of course, just the same. The idea is to get the “next 10 rows” or the “previous 10 rows”, just not the 10 rows at offset 3700.

      Also, if you’re keen on optimising performance in the browser, heck, then throw away the rows that you’ve displayed before. Like Google does. That’s just an implementation detail.

      Cheers,
      Lukas

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s