Deep Stack Traces Can be a Sign for Good Code Quality

The term “leaky abstractions” has been around for a while. Coining it is most often attributed to Joel Spolsky, who wrote this often-cited article about it. I’ve now stumbled upon another interpretation of a leaky abstraction, measured by the depth of a stack trace:

So, long stack traces are bad according to Geek & Poke. I’ve seen this argument before on Igor Polevoy’s blog (he’s the creator of ActiveJDBC, a Java implementation of the popular Ruby ActiveRecord query interface). Much like Joel Spolsky’s argumentation was often used to criticise ORMs, Igor’s argument was also used to compare ActiveJDBC with Hibernate. I’m citing:

One might say: so what, why do I care about the size of dependencies, depth of stack trace, etc. I think a good developer should care about these things. The thicker the framework, the more complex it is, the more memory it allocates, the more things can go wrong.

I completely agree that a framework with a certain amount of complexity tends to have longer stack traces. So if we run these axioms through our mental Prolog processors:

  • if Hibernate is a leaky abstraction, and
  • if Hibernate is complex, and
  • if complexity leads to long stack traces, then
  • leaky abstractions and long stack traces correlate

I wouldn’t go as far as claiming there’s a formal, causal connection. But a correlation seems logical.

But these things aren’t necessarily bad. In fact, long stack traces can be a good sign in terms of software quality. It can mean that the internals of a piece of software show a high amount of cohesion, a high degree of DRY-ness, which again means that there is little risk for subtle bugs deep down in your framework. Remember that a high cohesion and high DRY-ness lead to a large portion of the code being extremely relevant within the whole framework, which again means that any low-level bug will pretty much blow up the whole framework as it will lead to everything going wrong. If you do test-driven development, you’ll be rewarded by noticing immediately that your silly mistake fails 90% of your test cases.

A real-world example

Let’s use jOOQ as an example to illustrate this, as we’re already comparing Hibernate and ActiveJDBC. Some of the longest stack traces in a database access abstraction can be achieved by putting a breakpoint at the interface of that abstraction with JDBC. For instance, when fetching data from a JDBC ResultSet.

Utils.getFromResultSet(ExecuteContext, Class<T>, int) line: 1945
Utils.getFromResultSet(ExecuteContext, Field<U>, int) line: 1938
CursorImpl$CursorIterator$CursorRecordInitialiser.setValue(AbstractRecord, Field<T>, int) line: 1464
CursorImpl$CursorIterator$CursorRecordInitialiser.operate(AbstractRecord) line: 1447
CursorImpl$CursorIterator$CursorRecordInitialiser.operate(Record) line: 1
RecordDelegate<R>.operate(RecordOperation<R,E>) line: 119
CursorImpl$CursorIterator.fetchOne() line: 1413
CursorImpl$CursorIterator.next() line: 1389
CursorImpl$CursorIterator.next() line: 1
CursorImpl<R>.fetch(int) line: 202
CursorImpl<R>.fetch() line: 176
SelectQueryImpl<R>(AbstractResultQuery<R>).execute(ExecuteContext, ExecuteListener) line: 274
SelectQueryImpl<R>(AbstractQuery).execute() line: 322
T_2698Record(UpdatableRecordImpl<R>).refresh(Field<?>...) line: 438
T_2698Record(UpdatableRecordImpl<R>).refresh() line: 428
H2Test.testH2T2698InsertRecordWithDefault() line: 931

Compared to ActiveJDBC’s stack traces, that’s quite a bit more, but still less compared to Hibernate (which uses lots of reflection and instrumentation). And it involves rather cryptic inner classes with quite a bit of method overloading. How to interpret that? Let’s go through this, bottom-up (or top-down in the stack trace)

CursorRecordInitialiser

The CursorRecordInitialiser is an inner class that encapsules the initialisation of a Record by a Cursor, and it ensures that relevant parts of the ExecuteListener SPI are covered at a single place. It is the gateway to JDBC’s various ResultSet methods. It is a generic internal RecordOperation implementation that is called by…

RecordDelegate

… a RecordDelegate. While the class name is pretty meaningless, its purpose is to shield and wrap all direct record operations in a way that a central implementation of the RecordListener SPI can be achieved. This SPI can be implemented by client code to listen to active record lifecycle events. The price for keeping the implementation of this SPI DRY is a couple of elements on the stack trace, as such callbacks are the standard way to implement closures in the Java language. But keeping this logic DRY guarantees that no matter how a Record is initialised, the SPI will always be invoked. There are (almost) no forgotten corner-cases.

But we were initialising a Record in…

CursorImpl

… a CursorImpl, an implementation of a Cursor. This might appear odd, as jOOQ Cursors are used for “lazy fetching”, i.e. for fetching Records one-by-one from JDBC.

On the other hand, the SELECT query from this stack trace simply refreshes a single UpdatableRecord, jOOQ’s equivalent of an active record. Yet, still, all the lazy fetching logic is executed just as if we were fetching a large, complex data set. This is again to keep things DRY when fetching data. Of course, around 6 levels of stack trace could have been saved by simply reading the single record as we know there can be only one. But again, any subtle bug in the cursor will likely show up in some test case, even in a remote one like the test case for refreshing records.

Some may claim that all of this is wasting memory and CPU cycles. The opposite is more likely to be true. Modern JVM implementations are so good with managing and garbage-collecting short-lived objects and method calls, the slight additional complexity imposes almost no additional work to your runtime environment.

TL;DR: Long stack traces may indicate high cohesion and DRY-ness

The claim that a long stack trace is a bad thing is not necessarily correct. A long stack trace is what happens, when complex frameworks are well implemented. Complexity will inevitably lead to “leaky abstractions”. But only well-designed complexity will lead to long stack traces.

Conversely, short stack traces can mean two things:

  • Lack of complexity: The framework is simple, with few features. This matches Igor’s claim for ActiveJDBC, as he is advertising ActiveJDBC as a “simple framework”.
  • Lack of cohesion and DRY-ness: The framework is poorly written, and probably has poor test coverage and lots of bugs.

Tree data structures

As a final note, it’s worth mentioning that another case where long stack traces are inevitable is when tree structures / composite pattern structures are traversed using visitors. Anyone who has ever debugged XPath or XSLT will know how deep these traces are.

Database Abstraction and SQL Injection

I have subscribed to various user groups of jOOQ’s competing database abstraction tools. One of which is ActiveJDBC, a Java implementation of Active Record design pattern. Its maintainer Igor Polevoy recently claimed that:

SQL injection is a web application problem, and not directly related to an ORM. ActiveJDBC will process any SQL that is passed to it.

(See the discussion here: https://groups.google.com/d/topic/activejdbc-group/5D2jhWuW4Sg/discussion)

Is that really true? Should the database abstraction layer delegate SQL injection prevention to the client application?

SQL Injection Background

SQL injection is a problem that most of us developers have had to deal with one time or another in their professional lives. Wikipedia explains the problem nicely. Given the following piece of Java code (or any other language):

statement = "SELECT * FROM users WHERE name = '" + userName + "';"

Imagine that “userName” is a variable taken from an HTTP request. Blindly pasting an HTTP request parameter gives way to simple attacks as these:

-- attacker sends this code in the userName field:
userName = "a';DROP TABLE users; SELECT * FROM userinfo WHERE 't' = 't"

-- resulting in the following statement:
statement = "SELECT * FROM users WHERE name = 'a';"
          + "DROP TABLE users;" +
          + "SELECT * FROM userinfo WHERE 't' = 't';"

This doesn’t happen to you?

Maybe not. But the problem is seen quite often on Stack Overflow. More than 2000 results when searching for “SQL injection”: http://stackoverflow.com/search?q=sql+injection. So even if you know how to prevent it, someone on your team may not. OK, but …

it’s not that bad if one out of 500 statements is badly written by some programmer that was oblivious of this threat?

Think again. Have you ever heard of a tool called sqlmap? This tool will find any problematic page within your application within a couple of seconds/minutes/hours, depending on the severity of your injection problem. Not only that, once it has found problematic pages, it will be able to extract ALL kinds of data from your database. I mean ALL kinds of data. A selection of sqlmap features:

  • Support to enumerate users, password hashes, privileges, roles, databases, tables and columns.
  • Support to search for specific database names, specific tables across all databases or specific columns across all databases’ tables. This is useful, for instance, to identify tables containing custom application credentials where relevant columns’ names contain string like name and pass.
  • Support to download and upload any file from the database server underlying file system when the database software is MySQL, PostgreSQL or Microsoft SQL Server.
  • Support to execute arbitrary commands and retrieve their standard output on the database server underlying operating system when the database software is MySQL, PostgreSQL or Microsoft SQL Server.

Yes! If you suffer from SQL-injection-unsafe code, an attacker can seize your server under some circumstances!! In our company, we’ve tried sqlmap in a sandbox environment against some open source blog software with known vulnerabilities. We’ve managed to seize the server in no time, without writing a single line of SQL

Database Abstraction and SQL Injection

OK, now that I have your attention, let’s think again about what Igor Polevoy said:

SQL injection is a web application problem, and not directly related to an ORM. ActiveJDBC will process any SQL that is passed to it.

Yes, he may be right. Given that ActiveJDBC is a slim wrapper for JDBC, allowing to do nice CRUD simplifications, such as these (taken from their website):

List<Employee> people =
Employee.where("department = ? and hire_date > ? ", "IT", hireDate)
        .offset(21)
        .limit(10)
        .orderBy("hire_date asc");

Did you spot the risk for SQL injection? Right. Even if it uses bind values for underlying PreparedStatements, this tool is as unsafe as JDBC. You can avoid SQL injection, if you’re careful. Or you can start concatenating strings all over. But you have to be aware of that! How does jOOQ handle situations like these? The jOOQ manual explains how bind values are handled explicitly or implicitly. Here are some examples:

// Implicitly creating a bind value for "Poe"
create.select()
      .from(T_AUTHOR)
      .where(LAST_NAME.equal("Poe"));

// Explicitly creating a (named) bind value for "Poe"
create.select()
      .from(T_AUTHOR)
      .where(LAST_NAME.equal(param("lastName", "Poe")));

// Explicitly inlining "Poe" in the generated SQL string
create.select()
      .from(T_AUTHOR)
      .where(LAST_NAME.equal(inline("Poe")));

The above examples will yield

SELECT * FROM T_AUTHOR WHERE LAST_NAME = ?
SELECT * FROM T_AUTHOR WHERE LAST_NAME = ?
SELECT * FROM T_AUTHOR WHERE LAST_NAME = 'Poe'

In the case where “Poe” is inlined, escaping is handled by jOOQ, to prevent syntax errors and SQL injection. But jOOQ also supports injecting SQL strings directly in generated SQL. For instance:

// Inject plain SQL into jOOQ
create.select()
      .from(T_AUTHOR)
      .where("LAST_NAME = 'Poe'");

In this case, SQL injection can occur just as with JDBC

Conclusion

In essence, Igor is right. It is the (client) application developer’s responsibility to be aware of SQL injection issues created by their code. But if a database abstraction framework built on top of JDBC can avoid SQL injection as much as possible in its API, all the better. From a SQL injection perspective, database abstraction frameworks can be divided into three categories:

  • Simple utilities. These include Spring’s JdbcTemplate, or Apache’s DbUtils. They really just enhance the JDBC API with convenience (less exception handling, less verbosity, simpler variable binding, simpler data fetching). Of course, these tools won’t prevent SQL injection
  • Full SQL abstraction. These include jOOQ, JaQu, JPA’s CriteriaQuery and others. Their normal operation mode will always render bind values in generated SQL. This prevents SQL injection in most cases.
  • The others. Many other frameworks (including ActiveJDBC and Hibernate) are mainly based on (SQL or HQL) string operations. While they abstract many SQL-related things, they do not prevent SQL injection at all.

So, when you choose any SQL abstraction tool in your Java application, beware of the severity of SQL injection. And beware of the fact, whether your tool helps you prevent it or not!

Igor’s response

Note that Igor has posted this interesting response to this post here:

http://igorpolevoy.blogspot.ch/2012/07/defend-against-sql-injection-using.html