Stop Unit Testing Database Code

Writing tests that use an actual database is hard.

Period.

Now that this has been established, let’s have a look at a blog post by Marco Behler, in which he elaborates on various options when testing database code, with respect to transactionality. Testing database transactions is even harder than just testing database code. Marco lists a couple of options how to tweak these tests to make them “easier” to write.

One of the options is:

3. Set the flush-mode to FlushMode.ALWAYS for your tests

(note, this is about testing Hibernate code).

Marco puts this option in parentheses, because he’s not 100% convinced if it’s a good idea to test different behaviour from the productive one. Here’s our take on that:

Stop Unit Testing Database Code

By the time you start thinking about tweaking your test setup to achieve “simpler” transaction behaviour (or worse, use mocks for your database), you’re pretty much doomed. You start creating an alternative system that heavily deviates from your productive system. This essentially means:

  • that results from tests against your test system have (almost) no meaning
  • that your tests will not cover some of the most complex aspects of your productive system
  • that you will start spending way too much time on tweaking tests rather than implementing useful business logic

Instead, focus on writing integration tests that test your business logic on a very high level, i.e. on a “service layer” level, if your architecture has such a thing. If you’re using EJB, this would probably be on a session bean level. If you’re using REST or SOAP, this would be on a REST or SOAP service level. If you’re using HTTP (the retro name for REST), it would be on an HTTP service level.

Here are the advantages of this approach:

  • You might not get 100% coverage, but you will get the 80% coverage for those parts that really matter (with 20% of the effort). Your UI (or external system) doesn’t call the database in the quirkiest of forms either. It calls your services. Why would you test anything other than your services?
  • Your tests are very easy to maintain. You have a public API to your UI (or external system). It’s formal and easy to understand. It’s a black box with well-defined input and output parameters, which makes it easy to read / write tests

Databases are stateful. Obviously

What you have to do is let go of this idea that your database will ever participate in a “unit” (as in “unit” test). Units are pretty stateless, and thus it is very easy to write mutually independent unit tests for functional algorithms, for instance.

Databases couldn’t be any less stateless. The whole idea of a database is to manage state. And that’s very complicated and completely opposite to what any unit test can ever model. Many of the most meaningful state transitions span several database interactions, or even transactions, or maybe even services. For instance, it may be important that the CREATE_USER service invocation be immediately followed by an invocation of CHANGE_PASSWORD. You can only integration-test that on a service layer. Don’t believe it? What if CREATE_USER depends on an external LDAP system? Or complex security logic in stored procedures? Your integration test’s got that covered.

Takeaway

Writing tests that use an actual database is hard.

Yes. That won’t change. But your perception may. Don’t try to tweak things around this fact. Create a well-known test database. Reset it between tests. And write integration tests on a very high level. The 20/80 cost/benefit ratio will leave you no better choice.

Stay tuned for another blog post on this blog about how we integration-test the jOOQ API against 16 actual RDBMS

MyBatis’ Alternative Transaction Management

On the jOOQ user group, we’re often being asked how to perform transaction management with jOOQ. And we have an easy answer ready: You don’t do that with jOOQ. You choose your favourite transaction management API, be it:

And the above list is far from being exhaustive. Transaction management is something very delicate, and it certainly should not be imposed by a library whose main purpose is not transaction management, because any such library / framework will provide you with at most a very leaky abstraction of its transaction model. In other words, if you just slightly want to deviate from “the standard” model (e.g. as imposed by Hibernate), you will suffer greatly, as soon as you want to run 2-3 queries outside of Hibernate – e.g. batch or reporting statements through jOOQ.

MyBatis’ Alternative Transaction Management

MyBatis is a SQL templating engine that provides a couple of features on top of alternative templating engines, such as Velocity, or StringTemplate. One of these features built on top of templating is precisely transaction management, as can be seen in the docs.

From what we can read in the docs, it looks as though MyBatis’ transaction managers can be overriden by Spring, for instance. However, it is not easy to see how this is done. In fact, given that MyBatis also solves Connection pooling (for which there are also very viable alternatives, such as c3p0 and DBCP), and mapping (which could be solved more easily with custom transformers, such as offered by Spring’s JdbcTemplate, or jOOQ’s RecordMapper).

As many frameworks, MyBatis tries to solve problems outside its core scope, which is SQL templating. While this may be a good thing as you only rely on a single dependency, it is also quite a lock-in, in case you have a more complex model. In the case of transaction management, we believe that this was not a good idea by MyBatis.

Thoughts from MyBatis users?

jOOQ Newsletter December 13, 2013

subscribe to the newsletter here

A jOOQ Runtime Only Distribution

Several of our customers have made us aware of the fact that they don’t really need the jOOQ code generator, only the SQL builder / query DSL API and possibly the SQL execution functionality. In the next weeks, we’ll be working towards a new “jOOQ Runtime Only Distribution” that will ship at a lower price. Concretely, we’ll be giving those customers a 25% discount on the regular distributions for any of jOOQ Express, jOOQ Professional, and jOOQ Enterprise.

Obviously, we recommend to use jOOQ’s code generator to profit from the full feature scope that we offer.

License Amendments

We have updated our commercial license to improve your legal relationship with Data Geekery. Essentially, these things have been amended:

  • Definitions: We have added the missing definition of what is a “Minor Defect”
  • 6.2 Distribution Right: The distribution right section now grants you a perpetual license to distribute the software, instead of a timely limited one. This will allow you to continue to distribute, embed and use jOOQ in your End-user Application even if you terminate your Developer Workstation License Agreement with us. You will, however, still need to license jOOQ in order to maintain your End-user Application.
  • 7.1.1 Remedial Services: The remedial services section now formally defines how Minor Defects are remedied

As this updated license grants new rights to our customers, it shall be in effect immediately also for existing jOOQ 3.2 customers.

Upcoming Events

We have recently presented jOOQ at the Java2Days conference in Sofia, Bulgaria as well as at the Java User Group Berlin-Brandenburg. Both talks have enjoyed a high attendance with lots of SQL developers challenging us with interesting questions. Of course, jOOQ responds to most questions already, as Manuel Bernhardt, another speaker has observed:

If you want us to talk about jOOQ or SQL near you, do not hesitate to contact us.

Here is an overview of other, upcoming events:

Stay informed about 2014 events on www.jooq.org/news. If you’re looking for the slides, they’re available for free under a CC BY-SA 3.0 license on SlideShare.

Using jOOQ with Hibernate

While jOOQ can be seen as a popular alternative to Hibernate, there is no stopping you from using both frameworks in the same application. While Hibernate heavily improves every day CRUD operations, jOOQ heavily improves writing SQL in Java. The two goals can be orthogonal and thus it may make perfect sense to combine the two.

Vlad Mihalcea, an enthusiastic jOOQ user, is writing up a series of blog posts on his blog about how to use jOOQ with Hibernate:

Gavin King himself chimed in on Google+ and confirmed once again that it was never his intention for Hibernate to be used for everything.

If you want to share your own experience of your jOOQ / Spring, jOOQ / Hibernate, jOOQ / Anything integration, let us know!

SQL Zone – “Lightning Fast SQL with Proper Indexing”

Understanding SQL and the history of SQL is of the essence when you want to get the best out of your relational database. This has recently been nicely explained by Markus Winand at the Oredev conference 2013. Luckily, his presentation has been recorded for free review.

More about the history of SQL can be seen in this interesting article about Codd’s Relational Vision – Has NoSQL Come Full Circle? It proves what we have been evangelising for a while, ourselves. NoSQL is not a new technology. It is more of a return to pre-Codd times when people did not have a powerful relational model to abstract their storage implementation away from their application. This article claims that Codd’s greatest achievement was the fact that he surpassed the deficiencies of early databases:

  • Access dependencies
  • Order dependencies
  • Index dependencies

Read the full article for a very interesting historic insight into the relational model, and why NoSQL might not be a good answer for most general problems.

How to Integrate jOOQ with Hibernate

While jOOQ can be an alternative to Hibernate, it doesn’t have to entirely replace Hibernate. Many users have reported positive experiences when combining jOOQ with Hibernate, letting Hibernate do the tedious CRUD work, and jOOQ the complex querying and reporting through its sophisticated, yet intuitive query DSL.

Vlad Mihalcea who has been blogging interesting stuff about SQL and transaction models recently has now published a very nice tutorial on how to use JPA-annotated entities as a grounds for source code generation in jOOQ, through the following steps:

  • Write the annotated entities
  • Generate HSQLDB DDL from those entities
  • Execute the DDL in an HSQLDB instance
  • Run the jOOQ code generator to reverse-engineer the schema

With the above, he’s ready to query entities through JPA/Hibernate or jOOQ in no time.

Read the full blog post here:
http://vladmihalcea.wordpress.com/2013/12/06/jooq-facts-from-jpa-annotations-to-jooq-table-mappings/

Deep Stack Traces Can be a Sign for Good Code Quality

The term “leaky abstractions” has been around for a while. Coining it is most often attributed to Joel Spolsky, who wrote this often-cited article about it. I’ve now stumbled upon another interpretation of a leaky abstraction, measured by the depth of a stack trace:

Leaky Abstractions as understood by Geek and Poke (Licensed CC-BY)
Leaky Abstractions as understood by Geek and Poke (Licensed CC-BY)

So, long stack traces are bad according to Geek & Poke. I’ve seen this argument before on Igor Polevoy’s blog (he’s the creator of ActiveJDBC, a Java implementation of the popular Ruby ActiveRecord query interface). Much like Joel Spolsky’s argumentation was often used to criticise ORMs, Igor’s argument was also used to compare ActiveJDBC with Hibernate. I’m citing:

One might say: so what, why do I care about the size of dependencies, depth of stack trace, etc. I think a good developer should care about these things. The thicker the framework, the more complex it is, the more memory it allocates, the more things can go wrong.

I completely agree that a framework with a certain amount of complexity tends to have longer stack traces. So if we run these axioms through our mental Prolog processors:

  • if Hibernate is a leaky abstraction, and
  • if Hibernate is complex, and
  • if complexity leads to long stack traces, then
  • leaky abstractions and long stack traces correlate

I wouldn’t go as far as claiming there’s a formal, causal connection. But a correlation seems logical.

But these things aren’t necessarily bad. In fact, long stack traces can be a good sign in terms of software quality. It can mean that the internals of a piece of software show a high amount of cohesion, a high degree of DRY-ness, which again means that there is little risk for subtle bugs deep down in your framework. Remember that a high cohesion and high DRY-ness lead to a large portion of the code being extremely relevant within the whole framework, which again means that any low-level bug will pretty much blow up the whole framework as it will lead to everything going wrong. If you do test-driven development, you’ll be rewarded by noticing immediately that your silly mistake fails 90% of your test cases.

A real-world example

Let’s use jOOQ as an example to illustrate this, as we’re already comparing Hibernate and ActiveJDBC. Some of the longest stack traces in a database access abstraction can be achieved by putting a breakpoint at the interface of that abstraction with JDBC. For instance, when fetching data from a JDBC ResultSet.

Utils.getFromResultSet(ExecuteContext, Class<T>, int) line: 1945
Utils.getFromResultSet(ExecuteContext, Field<U>, int) line: 1938
CursorImpl$CursorIterator$CursorRecordInitialiser.setValue(AbstractRecord, Field<T>, int) line: 1464
CursorImpl$CursorIterator$CursorRecordInitialiser.operate(AbstractRecord) line: 1447
CursorImpl$CursorIterator$CursorRecordInitialiser.operate(Record) line: 1
RecordDelegate<R>.operate(RecordOperation<R,E>) line: 119
CursorImpl$CursorIterator.fetchOne() line: 1413
CursorImpl$CursorIterator.next() line: 1389
CursorImpl$CursorIterator.next() line: 1
CursorImpl<R>.fetch(int) line: 202
CursorImpl<R>.fetch() line: 176
SelectQueryImpl<R>(AbstractResultQuery<R>).execute(ExecuteContext, ExecuteListener) line: 274
SelectQueryImpl<R>(AbstractQuery).execute() line: 322
T_2698Record(UpdatableRecordImpl<R>).refresh(Field<?>...) line: 438
T_2698Record(UpdatableRecordImpl<R>).refresh() line: 428
H2Test.testH2T2698InsertRecordWithDefault() line: 931

Compared to ActiveJDBC’s stack traces, that’s quite a bit more, but still less compared to Hibernate (which uses lots of reflection and instrumentation). And it involves rather cryptic inner classes with quite a bit of method overloading. How to interpret that? Let’s go through this, bottom-up (or top-down in the stack trace)

CursorRecordInitialiser

The CursorRecordInitialiser is an inner class that encapsules the initialisation of a Record by a Cursor, and it ensures that relevant parts of the ExecuteListener SPI are covered at a single place. It is the gateway to JDBC’s various ResultSet methods. It is a generic internal RecordOperation implementation that is called by…

RecordDelegate

… a RecordDelegate. While the class name is pretty meaningless, its purpose is to shield and wrap all direct record operations in a way that a central implementation of the RecordListener SPI can be achieved. This SPI can be implemented by client code to listen to active record lifecycle events. The price for keeping the implementation of this SPI DRY is a couple of elements on the stack trace, as such callbacks are the standard way to implement closures in the Java language. But keeping this logic DRY guarantees that no matter how a Record is initialised, the SPI will always be invoked. There are (almost) no forgotten corner-cases.

But we were initialising a Record in…

CursorImpl

… a CursorImpl, an implementation of a Cursor. This might appear odd, as jOOQ Cursors are used for “lazy fetching”, i.e. for fetching Records one-by-one from JDBC.

On the other hand, the SELECT query from this stack trace simply refreshes a single UpdatableRecord, jOOQ’s equivalent of an active record. Yet, still, all the lazy fetching logic is executed just as if we were fetching a large, complex data set. This is again to keep things DRY when fetching data. Of course, around 6 levels of stack trace could have been saved by simply reading the single record as we know there can be only one. But again, any subtle bug in the cursor will likely show up in some test case, even in a remote one like the test case for refreshing records.

Some may claim that all of this is wasting memory and CPU cycles. The opposite is more likely to be true. Modern JVM implementations are so good with managing and garbage-collecting short-lived objects and method calls, the slight additional complexity imposes almost no additional work to your runtime environment.

TL;DR: Long stack traces may indicate high cohesion and DRY-ness

The claim that a long stack trace is a bad thing is not necessarily correct. A long stack trace is what happens, when complex frameworks are well implemented. Complexity will inevitably lead to “leaky abstractions”. But only well-designed complexity will lead to long stack traces.

Conversely, short stack traces can mean two things:

  • Lack of complexity: The framework is simple, with few features. This matches Igor’s claim for ActiveJDBC, as he is advertising ActiveJDBC as a “simple framework”.
  • Lack of cohesion and DRY-ness: The framework is poorly written, and probably has poor test coverage and lots of bugs.

Tree data structures

As a final note, it’s worth mentioning that another case where long stack traces are inevitable is when tree structures / composite pattern structures are traversed using visitors. Anyone who has ever debugged XPath or XSLT will know how deep these traces are.

Why Staying in Control of Your SQL is so Important

Lots of blog posts and research papers are written about the topics of scaling up and scaling out. This interesting blog post, for instance, sheds some light on the two strategies with respect to physical maintenance costs, such as cooling and electricity consumption. Certainly non-negligible aspects for very large systems.

But before solving problems at a very big scale, consider much simpler SQL tuning mechanisms. Very often, when you have a bottleneck in your application, it is at the database layer. This fact is used by many NoSQL evangelists to promote their products, claiming that scaling out is much easier with NoSQL databases. This might be true, but ask yourself: Do you need a system that works under heavy load? Or is your bottleneck a performance bottleneck?

In other words: Do you have a 5’000’000-concurrent-users problem? Or do you have a request-takes-more-than-3-seconds problem? Because if you suffer from the latter, you probably do not need to scale out nor up. Your “traditional” architecture is probably quite fine, but your database / SQL queries aren’t. The popular use-the-index-luke.com website features an interesting article about the two top performance problems caused by ORM tools. They are:

  • The infamous N+1 selects problem
  • The hardly-known Index-Only Scan

Both problems result from the fact that popular ORMs generate SQL code in a way that can hardly be influenced by developers. People often claim that tools like Hibernate generate better SQL than the average developer. This is true to an extent that the average developer might never care to actually learn to write better SQL. Which in turn leads to the above problems.

Hibernate is very good at generating 70% of your application’s boring CRUD SQL. At the same time, Hibernate never claimed to be a replacement for SQL as you will have to resort to native SQL in 30% of the time. Should you then use Hibernate’s native SQL API? Or an alternative like jOOQ?

The important thing is to get back in control of your SQL when performance matters. Know your indexes. Know your database meta data. And use a tool that allows you to write precisely the SQL statement you want. Learning better SQL will help you save lots of money on operations costs as you might not need to scale out nor up.

People Managing to Correctly Spell “Moron” in a Blog Comment

The notorious ORM pro / con discussion heavily amuses me. I always find it very funny when people have passionate discussions about which solution is better, rather than discussing about which solution is better suited for the problem at hand. In the case of ORMs vs. plain SQL, obviously, no solution is simply better as both techniques have their merits. When comparing ORMs with jOOQ, I think that this page summarises it pretty well:
http://www.hibernate-alternative.com

Now, this article and most specifically, one answer is hilarious:
http://java.dzone.com/articles/defense-hand-coded-sql

While the article’s author is already asking for trouble, check out this one particular answer. I love it when people manage to correctly spell “moron”:

People who handwrite SQL are invariably morons.

Here’s what you miss out when using a good ORM with generated mappings:

– Automatic first and second level caching

– Guaranteed consistency between code and database structure. Change the database? Regenerate pojo’s -> compile errors until code adheres to database structure.

– True vendor independence. Yes, I’m switching between six different db’s in our products with zero issues.

– I work with objects, not relation sets. That kinda makes sense in an oop language.

– Build-in query languages in decent ORMs are much more productive and, again, vendor independent.

– Any decent ORM understands and injects vendor specific query hints better than you.

Also, get a clue.

Here’s my adequate reply to the above:

OK, now this was amusing :-)

– Automatic first and second level caching

This, obviously, is utterly impossible outside the world of ORMs.

– Guaranteed consistency between code and database structure. Change the database? Regenerate pojo’s -> compile errors until code adheres to database structure.

True. No one has ever written a code generator before it was added to Hibernate.

– I work with objects, not relation sets. That kinda makes sense in an oop language

… which your DBA will probably always agree with. Remember to remind your manager why he bought that 1M$ Oracle license, when you run N+1 selects for fetching your OOP objects.

– Build-in query languages in decent ORMs are much more productive and, again, vendor independent.

Of course, there is always a black / white answer to “productivity”- questions. Like, how productively you can express a SQL:2003 MERGE statement with HQL. Or, how productively you can calculate a running total involving window functions, or maybe, recursive SQL with HQL.

– Any decent ORM understands and injects vendor specific query hints better than you.

That is indeed an amazing theory, which I was utterly unaware of.

The eternal debate between ORM lovers and haters. Mankind has always been this stupid.  Like the AC vs. DC discussion between Nikola Tesla and Thomas Edison

And, Eclipse will totally win over IntelliJ! ;-)