jOOQ Newsletter November 29, 2013

subscribe to the newsletter here

New jOOQ Express Edition

We have been listening to you, our valuable customers, and will now start to offer a budget license for those wanting to leverage the free Oracle Express or SQL Server Express editions. Inspired by those powerful databases, we call this the jOOQ Express Edition.

This competitively priced license can be used by up to three workstations within the same legal entity, which makes it a perfect tool for Freelancers and small Startups where control over your budget is of the essence. Obviously, all of jOOQ’s supported Open Source databases can be used, too.

New Open Source Blog Category

Since we have been pursuing a dual-licensing model for jOOQ, we made lots of new acquaintances and experiences both in the entrepreneurial world in general, as well as in the Open Source world in particular. Dual-licensing software is a very adventurous endeavour with at least these stakeholders involved:

  • “Enterprise customers” who see our steps as steps forward towards even better quality and higher professionalism.
  • “Everyday customers” who regret losing the option of a free middleware product but who are willing to spend money on high quality software that helps them increase quality and developer effectiveness in their code.
  • “Idealistic customers” who prefer Open Source to be “Free”, as understood byRichard Stallman.

As you can see, every project has its set of stakeholders. This is also true for the jOOQ dual-licensing project. We try our best to provide all of our stakeholders an optimal product without losing our visions of making Open Source a viable business model.

An important part of being “open” is to share experience and vision, and we’re doing so on our blog. In the recent past, we have blogged quite a bit about Open Source, to an extent that we have now created a new category for that subject. Some of these articles may be more controversial:

Others will be simple but interesting experience reports. In that context, we’re reaching out to major Open Source players, such as Oracle, Red Hat and others. So, meet us soon on opensource.com, because as RedHat puts it: “Open Source is changing the world!”

Upcoming Events

The schedule for the awesome Java2Days conference in Sofia is now published! The same is true for the great abstract of the JUG-BB event in Berlin. Here is an overview of other, upcoming events:

Stay informed about 2014 events on www.jooq.org/news.

SQL Zone – SQL Antipatterns

We’ve advertised excellent books on SQL before. This time we want to make you aware of SQL Antipatterns by Bill Karwin. Bill has a very refreshing style of writing. One can really feel the pain he must have been suffering when discovering the code madness he has seen in his work as a consultant. We can feel with him, as we have created Code That Made Me Cry. This book is an excellent read for architects who need new input about how to explain bad ideas to their team, much as it is an excellent read for SQL developers who want to learn one or two things about how not to do things.

We list these books on our website at
http://www.jooq.org/learn

If you’ve written a book, a blog post or a tutorial about SQL and want us to review and/or advertise it, please contact us.

SQL Zone – Keyset Paging

We’ve written about the concept of keyset paging (also known as the “seek method”) before. Applying keyset paging is not so trivial, which is why the upcoming jOOQ 3.3’s native support for it will be even more awesome. One of our blog readers was eager to see more concrete examples for keyset paging, and we heard them. Read about how to pre-calculate all page boundaries for pages when applying keyset paging:
https://blog.jooq.org/2013/11/18/faster-sql-pagination-with-keysets-continued/

Faster SQL Pagination with Keysets, Continued

A while ago, I have blogged about how to perform keyset pagination (some also call this the “seek method”). Keyset pagination is a very powerful technique to perform constant-time pagination also on very large result sets, where “classic” OFFSET pagination will inevitably get slow on large page numbers.

Keyset pagination is most useful for lazy loading the “next page”, much more than for jumping to page 235 directly, for instance. A good example where this is useful are social media, such as Twitter or Facebook with their streams of content. When you reach the bottom of a page, you just want the “next couple of tweets”. Not the tweets at offset 3564.

If your data being paginated on stays “reasonably unchanged” while paginating, you can even perform classical OFFSET pagination by pre-calculating (and caching) all page boundaries in one go. Imagine, you have this data (I’m using PostgreSQL for the example):

create table t(
  id int,
  value int
);

insert into t (id, value)
select id, random() * 100
from generate_series(1, 1000) x(id);

Our data has 1000 records with random values between 0 and 99. Now, let’s browse this data in ascending order, ordered by VALUE and then by ID. If we wanted to get to page 6 with page sizes of 5 with OFFSET pagination, we’d write:

select id, value
from t
order by value, id
limit 5
offset 25

This would then yield something like

|  ID | VALUE |
|-----|-------|
| 640 |     2 |
| 776 |     2 |
| 815 |     2 |
| 947 |     2 |
|  37 |     3 |

See also this SQLFiddle

OFFSET pagination emulation with cached page boundaries

The above can be emulated using keyset pagination if we know the page boundaries of every page. In other words, in order to jump to page 6 without an actual OFFSET, we’d have to know the value of the record immediately preceding {"ID":640,"VALUE":2}. Or better, let’s just figure out all the page boundaries with the following query:

select 
  t.id, 
  t.value,
  case row_number() 
       over(order by t.value, t.id) % 5 
    when 0 then 1 
    else 0 
  end page_boundary
from t
order by t.value, t.id

The above query yields

|   ID | VALUE | PAGE_BOUNDARY |
|------|-------|---------------|
|  ... |   ... |           ... |
|  474 |     2 |             0 |
|  533 |     2 |             1 | <-- Before page 6
|  640 |     2 |             0 |
|  776 |     2 |             0 |
|  815 |     2 |             0 |
|  947 |     2 |             0 |
|   37 |     3 |             1 | <-- Last on page 6
|  287 |     3 |             0 |
|  450 |     3 |             0 |
|  ... |   ... |           ... |

See also this SQL Fiddle

As you can see, the record just before {"ID":640,"VALUE":2} is {"ID":533,"VALUE":2}, which is the page boundary that we need to jump to if we want to go to page 6. Page 7 then starts with the record just after {"ID":37,"VALUE":3}.

The above query selects too much data, of course. We’re only interested in those records where PAGE_BOUNDARY = 1. Besides, why not calculate the page numbers already in the database with yet another call to ROW_NUMBER(). Let’s write:

select 
  x.value, 
  x.id,
  row_number() 
    over(order by x.value, x.id) + 1 page_number
from (
  select 
    t.id, 
    t.value,
    case row_number() 
         over(order by t.value, t.id) % 5 
      when 0 then 1 
      else 0 
    end page_boundary
  from t
  order by t.value, t.id
) x
where x.page_boundary = 1

This will then yield:

| VALUE |  ID | PAGE_NUMBER |
|-------|-----|-------------|
|     0 | 786 |           2 |
|     1 | 250 |           3 |
|     1 | 959 |           4 |
|     2 | 229 |           5 |
|     2 | 533 |           6 | <-- Before page 6
|     3 |  37 |           7 |
|     3 | 768 |           8 |

See also this SQLFiddle.

We can now jump to page 6 with this simple query:

select id, value
from t
where (value, id) > (2, 533)
order by value, id
limit 5

… which will yield the same as the previous OFFSET query:

|  ID | VALUE |
|-----|-------|
| 640 |     2 |
| 776 |     2 |
| 815 |     2 |
| 947 |     2 |
|  37 |     3 |

See also this SQLFiddle

If you’re planning on using the upcoming jOOQ 3.3, the same query can be achieved with the following SEEK syntax:

DSL.using(configuration)
   .select(T.ID, T.VALUE)
   .from(T)
   .orderBy(T.VALUE, T.ID)
   .seek(2, 533)
   .limit(5);

The advantage of this is that you don’t have to write out the SEEK predicate explicitly, which adds readability and typesafety, specifically if your ORDER BY clause is a little more complex

If window functions aren’t available

The above queries make use of window functions, which aren’t available in all databases, unfortunately. If you’re using MySQL, for instance, you will have to use a different mechanism to find the PAGE_BOUNDARY. One such example us using a scalar subquery:

select 
  t.id, 
  t.value,
  case (
      select count(*)
      from t t2
      where (t2.value, t2.id) <= (t.value, t.id)
    ) % 5 
    when 0 then 1 
    else 0 
  end page_boundary
from t
order by t.value, t.id

See also this SQLFiddle

Such a scalar subquery might be quite costly if your database performs poor query optimisation. Your best bet would be to measure things and decide whether caching page boundaries to be able to apply keyset pagination is really faster than classic OFFSET paging.

Conclusion

As explained in our previous blog post about keyset pagination, this technique can bring great performance improvements as pagination can be achieved in constant time, leveraging existing indexes. It is most useful when the underlying data is very stable (no records added / removed while paginating), or when pagination “stability” is desired even if records are added / removed.

Keyset pagination, or the “seek method”, should be part of every SQL developer’s tool set.

jOOQ Newsletter October 28, 2013

subscribe to the newsletter here

jOOQ 3.2 Licensing Update

Three weeks ago, we have announced and released

jOOQ 3.2 under the terms of a new dual-licensing model offering jOOQ’s support for commercial databases under a commercial jOOQ license.

The jOOQ community has reacted intensively on the user group, most of the feedback having been very positive. The licensing change is received as a good sign for jOOQ’s long-term future, even if some aspects of it were rightfully challenged.

One important such aspect is the jOOQ license 1.0 (Oct 1) phrasing on the term “distribution”. This has been rectified in the new jOOQ license 1.1 (Oct 28) which explicitly grants our licensees the right to distribute and embed jOOQ with their end-user applications (see §6.2). This has been a misunderstanding on our part and was never intended to be a limitation to our customers. As this updated license grants new rights to our customers, it shall be in effect immediately also for existing jOOQ 3.2 customers.

Do also note that we have published an FAQ for licensing questions you may have.

jOOQ 3.3 Outlook

The upcoming release 3.3 (scheduled for early 2014) will include a bulk of exciting new features among which:

  • Support for MS Access, available in the jOOQ Professional Edition
  • Support for keyset paging (see SQL zone, below)
  • SQL transformation SPI example implementations showing how to perform row-level security and shared-schema multi-tenancy with jOOQ 3.2+
  • A new Open Source contribution platform, allowing the community to easily share SPI implementations and other sorts of plugins not maintained by the Data Geekery

And much more. See the GitHub 3.3.0 milestone for an overview of feature candidates, or participate in the user group to discuss your ideas!

Upcoming Events

Some of you may have noticed with regret that the MEDIT Symposium in beautiful Sicilly has been cancelled. It would have been a great place to learn about the latest activities in the Open Source world. But we are more than happy to announce to you that these events are confirmed. Meet Lukas and jOOQ and lots of other good stuff at:

Stay informed about 2014 events on www.jooq.org/news.

SQL Zone – keyset paging

In recent newsletters, we’ve been talking about NoSQL. But let’s get back to SQL andhow to stay in control of your SQL. As mentioned before, jOOQ 3.3 will provide native support for keyset paging, which is also referred to by some as the “seek method”. Our recent blog post explains how this works, and why it allows you to perform constant-time paging even in very large data sets. This is particularly useful when implementing Twitter- or Facebook-like lazy loading of more and more data!

SQL Zone – useful SQL tricks

We also would like to advertise two SQL tricks that come in handy every once in a while:

SQL Zone – MySQL parser / jOOQ code generator

Our October 10 newsletter announced a new cooperation between GUDU Soft and Data Geekery: SQL2jOOQ. Developments around a MySQL -> jOOQ code transformation have progressed quite a bit. This promising tool will soon help you migrate your large legacy application with tons of hand-written, string-based SQL to jOOQ.