Deep Stack Traces Can be a Sign for Good Code Quality

The term “leaky abstractions” has been around for a while. Coining it is most often attributed to Joel Spolsky, who wrote this often-cited article about it. I’ve now stumbled upon another interpretation of a leaky abstraction, measured by the depth of a stack trace:

Leaky Abstractions as understood by Geek and Poke (Licensed CC-BY)
Leaky Abstractions as understood by Geek and Poke (Licensed CC-BY)

So, long stack traces are bad according to Geek & Poke. I’ve seen this argument before on Igor Polevoy’s blog (he’s the creator of ActiveJDBC, a Java implementation of the popular Ruby ActiveRecord query interface). Much like Joel Spolsky’s argumentation was often used to criticise ORMs, Igor’s argument was also used to compare ActiveJDBC with Hibernate. I’m citing:

One might say: so what, why do I care about the size of dependencies, depth of stack trace, etc. I think a good developer should care about these things. The thicker the framework, the more complex it is, the more memory it allocates, the more things can go wrong.

I completely agree that a framework with a certain amount of complexity tends to have longer stack traces. So if we run these axioms through our mental Prolog processors:

  • if Hibernate is a leaky abstraction, and
  • if Hibernate is complex, and
  • if complexity leads to long stack traces, then
  • leaky abstractions and long stack traces correlate

I wouldn’t go as far as claiming there’s a formal, causal connection. But a correlation seems logical.

But these things aren’t necessarily bad. In fact, long stack traces can be a good sign in terms of software quality. It can mean that the internals of a piece of software show a high amount of cohesion, a high degree of DRY-ness, which again means that there is little risk for subtle bugs deep down in your framework. Remember that a high cohesion and high DRY-ness lead to a large portion of the code being extremely relevant within the whole framework, which again means that any low-level bug will pretty much blow up the whole framework as it will lead to everything going wrong. If you do test-driven development, you’ll be rewarded by noticing immediately that your silly mistake fails 90% of your test cases.

A real-world example

Let’s use jOOQ as an example to illustrate this, as we’re already comparing Hibernate and ActiveJDBC. Some of the longest stack traces in a database access abstraction can be achieved by putting a breakpoint at the interface of that abstraction with JDBC. For instance, when fetching data from a JDBC ResultSet.

Utils.getFromResultSet(ExecuteContext, Class<T>, int) line: 1945
Utils.getFromResultSet(ExecuteContext, Field<U>, int) line: 1938
CursorImpl$CursorIterator$CursorRecordInitialiser.setValue(AbstractRecord, Field<T>, int) line: 1464
CursorImpl$CursorIterator$CursorRecordInitialiser.operate(AbstractRecord) line: 1447
CursorImpl$CursorIterator$CursorRecordInitialiser.operate(Record) line: 1
RecordDelegate<R>.operate(RecordOperation<R,E>) line: 119
CursorImpl$CursorIterator.fetchOne() line: 1413
CursorImpl$CursorIterator.next() line: 1389
CursorImpl$CursorIterator.next() line: 1
CursorImpl<R>.fetch(int) line: 202
CursorImpl<R>.fetch() line: 176
SelectQueryImpl<R>(AbstractResultQuery<R>).execute(ExecuteContext, ExecuteListener) line: 274
SelectQueryImpl<R>(AbstractQuery).execute() line: 322
T_2698Record(UpdatableRecordImpl<R>).refresh(Field<?>...) line: 438
T_2698Record(UpdatableRecordImpl<R>).refresh() line: 428
H2Test.testH2T2698InsertRecordWithDefault() line: 931

Compared to ActiveJDBC’s stack traces, that’s quite a bit more, but still less compared to Hibernate (which uses lots of reflection and instrumentation). And it involves rather cryptic inner classes with quite a bit of method overloading. How to interpret that? Let’s go through this, bottom-up (or top-down in the stack trace)

CursorRecordInitialiser

The CursorRecordInitialiser is an inner class that encapsules the initialisation of a Record by a Cursor, and it ensures that relevant parts of the ExecuteListener SPI are covered at a single place. It is the gateway to JDBC’s various ResultSet methods. It is a generic internal RecordOperation implementation that is called by…

RecordDelegate

… a RecordDelegate. While the class name is pretty meaningless, its purpose is to shield and wrap all direct record operations in a way that a central implementation of the RecordListener SPI can be achieved. This SPI can be implemented by client code to listen to active record lifecycle events. The price for keeping the implementation of this SPI DRY is a couple of elements on the stack trace, as such callbacks are the standard way to implement closures in the Java language. But keeping this logic DRY guarantees that no matter how a Record is initialised, the SPI will always be invoked. There are (almost) no forgotten corner-cases.

But we were initialising a Record in…

CursorImpl

… a CursorImpl, an implementation of a Cursor. This might appear odd, as jOOQ Cursors are used for “lazy fetching”, i.e. for fetching Records one-by-one from JDBC.

On the other hand, the SELECT query from this stack trace simply refreshes a single UpdatableRecord, jOOQ’s equivalent of an active record. Yet, still, all the lazy fetching logic is executed just as if we were fetching a large, complex data set. This is again to keep things DRY when fetching data. Of course, around 6 levels of stack trace could have been saved by simply reading the single record as we know there can be only one. But again, any subtle bug in the cursor will likely show up in some test case, even in a remote one like the test case for refreshing records.

Some may claim that all of this is wasting memory and CPU cycles. The opposite is more likely to be true. Modern JVM implementations are so good with managing and garbage-collecting short-lived objects and method calls, the slight additional complexity imposes almost no additional work to your runtime environment.

TL;DR: Long stack traces may indicate high cohesion and DRY-ness

The claim that a long stack trace is a bad thing is not necessarily correct. A long stack trace is what happens, when complex frameworks are well implemented. Complexity will inevitably lead to “leaky abstractions”. But only well-designed complexity will lead to long stack traces.

Conversely, short stack traces can mean two things:

  • Lack of complexity: The framework is simple, with few features. This matches Igor’s claim for ActiveJDBC, as he is advertising ActiveJDBC as a “simple framework”.
  • Lack of cohesion and DRY-ness: The framework is poorly written, and probably has poor test coverage and lots of bugs.

Tree data structures

As a final note, it’s worth mentioning that another case where long stack traces are inevitable is when tree structures / composite pattern structures are traversed using visitors. Anyone who has ever debugged XPath or XSLT will know how deep these traces are.

#CTMMC, the New Hashtag for Code That Made Me Cry

Our recent article about Code That Made Me Cry was really well received and had quite a few readers, both on our blog and on our syndication partners:

In each of us programmers is a little geek with a little geek humour. This is reflected by funny comics like the one about

The only valid measurement of code quality: WTF/minute

Other platforms ridiculing bad code include

But one thing missing is a hashtag (and acronym) on twitter. It should be called #CTTMC for Code That Made Me Cry.

To promote this hashtag, we have created a website, where we will list our collective programmer pain. Behold:

http://www.ctmmc.net

On Friday Dec 13th 2013, Things *WILL* go Wrong

We’re writing for @JavaAdvent, on Friday Dec, 13th 2013. Superstitious? We are and we’ll give some fun and scary insights! Stay tuned and follow @JavaAdvent to be ready for an interesting, geeky Holidays season!

See also posts from 2012.

How to Create a Range From 1 to 10 in SQL

How do you create a range from 1 to 10 in SQL? Have you ever thought about it? This is such an easy problem to solve in any imperative language, it’s ridiculous. Take Java (or C, whatever) for instance:

for (int i = 1; i <= 10; i++)
  System.out.println(i);

This was easy, right? Things even look more lean when using functional programming. Take Scala, for instance:

(1 to 10) foreach { t => println(t) }

We could fill about 25 pages about various ways to do the above in Scala, agreeing on how awesome Scala is (or what hipsters we are).

But how to create a range in SQL?

… And we’ll exclude using stored procedures, because that would be no fun. In SQL, the data source we’re operating on are tables. If we want a range from 1 to 10, we’d probably need a table containing exactly those ten values. Here are a couple of good, bad, and ugly options of doing precisely that in SQL. OK, they’re mostly bad and ugly.

By creating a table

The dumbest way to do this would be to create an actual temporary table just for that purpose:

CREATE TABLE "1 to 10" AS
SELECT 1 value FROM DUAL UNION ALL
SELECT 2       FROM DUAL UNION ALL
SELECT 3       FROM DUAL UNION ALL
SELECT 4       FROM DUAL UNION ALL
SELECT 5       FROM DUAL UNION ALL
SELECT 6       FROM DUAL UNION ALL
SELECT 7       FROM DUAL UNION ALL
SELECT 8       FROM DUAL UNION ALL
SELECT 9       FROM DUAL UNION ALL
SELECT 10      FROM DUAL

See also this SQLFiddle

This table can then be used in any type of select. Now that’s pretty dumb but straightforward, right? I mean, how many actual records are you going to put in there?

By using a VALUES() table constructor

This solution isn’t that much better. You can create a derived table and manually add the values from 1 to 10 to that derived table using the VALUES() table constructor. In SQL Server, you could write:

SELECT V
FROM (
  VALUES (1), (2), (3), (4), (5), 
         (6), (7), (8), (9), (10)
) [1 to 10](V)

See also this SQLFiddle

By creating enough self-joins of a sufficent number of values

Another “dumb”, yet a bit more generic solution would be to create only a certain amount of constant values in a table, view or CTE (e.g. two) and then self join that table enough times to reach the desired range length (e.g. four times). The following example will produce values from 1 to 10, “easily”:

WITH T(V) AS (
  SELECT 0 FROM DUAL UNION ALL
  SELECT 1 FROM DUAL
)
SELECT V FROM (
  SELECT 1        + 
             T1.V + 
         2 * T2.V + 
         4 * T3.V + 
         8 * T4.V V
  FROM T T1, T T2, T T3, T T4
)
WHERE V <= 10
ORDER BY V

See also this SQLFiddle

By using grouping sets

Another way to generate large tables is by using grouping sets, or more specifically by using the CUBE() function. This works much in a similar way as the previous example when self-joining a table with two records:

SELECT ROWNUM FROM (
  SELECT 1
  FROM DUAL
  GROUP BY CUBE(1, 2, 3, 4)
)
WHERE ROWNUM <= 10

See also this SQLFiddle

By just taking random records from a “large enough” table

In Oracle, you could probably use ALL_OBJECTs. If you’re only counting to 10, you’ll certainly get enough results from that table:

SELECT ROWNUM FROM ALL_OBJECTS
WHERE ROWNUM <= 10

See also this SQLFiddle

What’s so “awesome” about this solution is that you can cross join that table several times to be sure to get enough values:

SELECT ROWNUM 
FROM ALL_OBJECTS, ALL_OBJECTS,
     ALL_OBJECTS, ALL_OBJECTS
WHERE ROWNUM <= 10

OK. Just kidding. Don’t actually do that. Or if you do, don’t blame me if your productive system runs low on memory.

By using the awesome PostgreSQL GENERATE_SERIES() function

Incredibly, this isn’t part of the SQL standard. Neither is it available in most databases but PostgreSQL, which has the GENERATE_SERIES() function. This is much like Scala’s range notation: (1 to 10)

SELECT * FROM GENERATE_SERIES(1, 10)

See also this SQLFiddle

By using CONNECT BY

If you’re using Oracle, then there’s a really easy way to create such a table using the CONNECT BY clause, which is almost as convenient as PostgreSQL’s GENERATE_SERIES() function:

SELECT LEVEL FROM DUAL
CONNECT BY LEVEL < 10

See also this SQLFiddle

By using a recursive CTE

Recursive common table expressions are cool, yet utterly unreadable. the equivalent of the above Oracle CONNECT BY clause when written using a recursive CTE would look like this:

WITH "1 to 10"(V) AS (
  SELECT 1 FROM DUAL
  UNION ALL
  SELECT V + 1 FROM "1 to 10"
  WHERE V < 10
)
SELECT * FROM "1 to 10"

See also this SQLFiddle

By using Oracle’s MODEL clause

A decent “best of” comparison of how to do things in SQL wouldn’t be complete without at least one example using Oracle’s MODEL clause (see this awesome use-case for Oracle’s spreadsheet feature). Use this clause only to make your co workers really angry when maintaining your SQL code.

Bow before this beauty!

SELECT V
FROM (
  SELECT 1 V FROM DUAL
) T
MODEL DIMENSION BY (ROWNUM R)
      MEASURES (V)
      RULES ITERATE (10) (
        V[ITERATION_NUMBER] = CV(R) + 1
      )
ORDER BY 1

See also this SQLFiddle

Conclusion

There aren’t actually many nice solutions to do such a simple thing in SQL. Clearly, PostgreSQL’s GENERATE_SERIES() table function is the most beautiful solution. Oracle’s CONNECT BY clause comes close. For all other databases, some trickery has to be applied in one way or another.

Unfortunately.

Faster SQL Pagination with Keysets, Continued

A while ago, I have blogged about how to perform keyset pagination (some also call this the “seek method”). Keyset pagination is a very powerful technique to perform constant-time pagination also on very large result sets, where “classic” OFFSET pagination will inevitably get slow on large page numbers.

Keyset pagination is most useful for lazy loading the “next page”, much more than for jumping to page 235 directly, for instance. A good example where this is useful are social media, such as Twitter or Facebook with their streams of content. When you reach the bottom of a page, you just want the “next couple of tweets”. Not the tweets at offset 3564.

If your data being paginated on stays “reasonably unchanged” while paginating, you can even perform classical OFFSET pagination by pre-calculating (and caching) all page boundaries in one go. Imagine, you have this data (I’m using PostgreSQL for the example):

create table t(
  id int,
  value int
);

insert into t (id, value)
select id, random() * 100
from generate_series(1, 1000) x(id);

Our data has 1000 records with random values between 0 and 99. Now, let’s browse this data in ascending order, ordered by VALUE and then by ID. If we wanted to get to page 6 with page sizes of 5 with OFFSET pagination, we’d write:

select id, value
from t
order by value, id
limit 5
offset 25

This would then yield something like

|  ID | VALUE |
|-----|-------|
| 640 |     2 |
| 776 |     2 |
| 815 |     2 |
| 947 |     2 |
|  37 |     3 |

See also this SQLFiddle

OFFSET pagination emulation with cached page boundaries

The above can be emulated using keyset pagination if we know the page boundaries of every page. In other words, in order to jump to page 6 without an actual OFFSET, we’d have to know the value of the record immediately preceding {"ID":640,"VALUE":2}. Or better, let’s just figure out all the page boundaries with the following query:

select 
  t.id, 
  t.value,
  case row_number() 
       over(order by t.value, t.id) % 5 
    when 0 then 1 
    else 0 
  end page_boundary
from t
order by t.value, t.id

The above query yields

|   ID | VALUE | PAGE_BOUNDARY |
|------|-------|---------------|
|  ... |   ... |           ... |
|  474 |     2 |             0 |
|  533 |     2 |             1 | <-- Before page 6
|  640 |     2 |             0 |
|  776 |     2 |             0 |
|  815 |     2 |             0 |
|  947 |     2 |             0 |
|   37 |     3 |             1 | <-- Last on page 6
|  287 |     3 |             0 |
|  450 |     3 |             0 |
|  ... |   ... |           ... |

See also this SQL Fiddle

As you can see, the record just before {"ID":640,"VALUE":2} is {"ID":533,"VALUE":2}, which is the page boundary that we need to jump to if we want to go to page 6. Page 7 then starts with the record just after {"ID":37,"VALUE":3}.

The above query selects too much data, of course. We’re only interested in those records where PAGE_BOUNDARY = 1. Besides, why not calculate the page numbers already in the database with yet another call to ROW_NUMBER(). Let’s write:

select 
  x.value, 
  x.id,
  row_number() 
    over(order by x.value, x.id) + 1 page_number
from (
  select 
    t.id, 
    t.value,
    case row_number() 
         over(order by t.value, t.id) % 5 
      when 0 then 1 
      else 0 
    end page_boundary
  from t
  order by t.value, t.id
) x
where x.page_boundary = 1

This will then yield:

| VALUE |  ID | PAGE_NUMBER |
|-------|-----|-------------|
|     0 | 786 |           2 |
|     1 | 250 |           3 |
|     1 | 959 |           4 |
|     2 | 229 |           5 |
|     2 | 533 |           6 | <-- Before page 6
|     3 |  37 |           7 |
|     3 | 768 |           8 |

See also this SQLFiddle.

We can now jump to page 6 with this simple query:

select id, value
from t
where (value, id) > (2, 533)
order by value, id
limit 5

… which will yield the same as the previous OFFSET query:

|  ID | VALUE |
|-----|-------|
| 640 |     2 |
| 776 |     2 |
| 815 |     2 |
| 947 |     2 |
|  37 |     3 |

See also this SQLFiddle

If you’re planning on using the upcoming jOOQ 3.3, the same query can be achieved with the following SEEK syntax:

DSL.using(configuration)
   .select(T.ID, T.VALUE)
   .from(T)
   .orderBy(T.VALUE, T.ID)
   .seek(2, 533)
   .limit(5);

The advantage of this is that you don’t have to write out the SEEK predicate explicitly, which adds readability and typesafety, specifically if your ORDER BY clause is a little more complex

If window functions aren’t available

The above queries make use of window functions, which aren’t available in all databases, unfortunately. If you’re using MySQL, for instance, you will have to use a different mechanism to find the PAGE_BOUNDARY. One such example us using a scalar subquery:

select 
  t.id, 
  t.value,
  case (
      select count(*)
      from t t2
      where (t2.value, t2.id) <= (t.value, t.id)
    ) % 5 
    when 0 then 1 
    else 0 
  end page_boundary
from t
order by t.value, t.id

See also this SQLFiddle

Such a scalar subquery might be quite costly if your database performs poor query optimisation. Your best bet would be to measure things and decide whether caching page boundaries to be able to apply keyset pagination is really faster than classic OFFSET paging.

Conclusion

As explained in our previous blog post about keyset pagination, this technique can bring great performance improvements as pagination can be achieved in constant time, leveraging existing indexes. It is most useful when the underlying data is very stable (no records added / removed while paginating), or when pagination “stability” is desired even if records are added / removed.

Keyset pagination, or the “seek method”, should be part of every SQL developer’s tool set.

Where’s the Self-Confidence when Advertising Java 8, Oracle?

I have often wondered, why the team around Brian Goetz has been heading towards a “decent compromise” so strongly from the beginning, both from a marketing AND technical point of view, instead of adding more boldness to how Java 8 is advertised. At Devoxx Belgium 2013, Brian Goetz seems to have really sold his accomplishments completely under value, according to this interesting article. Having extensively followed the lambda-dev mailing list, I can only stress how little the creators of Java 8 loved their new defender methods feature, for instance.

Java 8 is what we have all been waiting for, for so long! After all, the introduction of lambda expressions and defender methods (equally impactful, if not as often advertised!) is one of the most significant improvements to the Java language since the very beginnings.

Given the tremendous success of LINQ in .NET, I have recently contemplated whether Java 8, lambda expressions and the Streams API might actually be an equally interesting approach to adding features to an ecosystem, compared to the “scariness” of comprehensions and monads as understood by LINQ: https://blog.jooq.org/2013/11/02/does-java-8-still-need-linq-or-is-it-better-than-linq/

While my article above certainly wasn’t well received with the .NET community (and even Erik Meijer himself smirked at it), it did get quite a bit of love from the Java community. In other words, the Java community is ready for Java 8 goodness. Let’s hope Oracle will start advertising it as the cool new thing that it is.

The Spam Manifesto: Reactive Spamming, Big Spam, Spam Reduce

Dear Spammers,

SpamInACanWhen blogging and interacting with our users on forums, dealing with your spam is part of our every day work. Our WordPress blog luckily uses Akismet, a comment spam prevention tool, which removes lots and lots of spam, such that our moderation efforts stay very low. Google Groups, on the other hand, uses Google’s own awesome spam filter, which works impeccably in GMail already. But unfortunately, we have to perform quite a bit of moderation as there are an incredible number of Chinese spammers.

Who are these spammers? Who spends so much time in developing robots for stuff that doesn’t really work as most mail / blog vendors have awesome counter-measures? You’re just wasting your time without any actual conversion!

To make things worse, blog software renders many links as “nofollow”, so even if a spam comment slips through, you won’t get the conversion. You spammers are obviously in the tech stone age, perhaps writing your spam software in COBOL or Delphi?

Let us help you get up to date with recent developments. I’ve had a brief discussion with Petri Kainulainen on the subject, and we think that there should be a new

Spam manifesto

We would like to introduce Reactive Spamming. We believe that Hipster Spammers could take the lead and greatly benefit from fancy tech such as Clojure, Akka and Hadoop. Imagine implementing SpamReduce or MapSpam algorithms on BigSpam. Gartner will certainly feature you on their tech radar. We’ll create SpamConf, a great conference where brilliant minds of our time can discuss the latest in spamming, and the Groovy folks will present their new SpamDSL, to help you express your algorithms in FluentSpam.

And now, for some background info about Spam.