jOOQ Tuesdays: Brett Wooldridge Shows What it Takes to Write the Fastest Java Connection Pool


Welcome to the jOOQ Tuesdays series. In this series, we’ll publish an article on the third Tuesday every other month where we interview someone we find exciting in our industry from a jOOQ perspective. This includes people who work with SQL, Java, Open Source, and a variety of other related topics.

brett-wooldridge

I’m very excited to feature today Brett Wooldridge, creator of HikariCP, the fastest connection pool available for Java.

Brett, you’ve created one of the most popular connection pools for Java: HikariCP. What made your library so popular?

I’ll provide some backstory on HikariCP before I answer that, but I’ll tease the answer by saying “marketing“.

A few years ago I was creating a product prototype for the company I work for, and I needed a connection pool. Like most developers I just wanted to drop in a pool and move on, so I took to the web to find the most popular and actively maintained library. Unfortunately, while load testing the prototype we started encountering deadlocks, and exceptions indicating connection state bleed over between threads.

Because the pool was open source, I thought I’d just pull down the code, find and fix the problems, and contribute back. But when I opened the code, I found thousands of lines more code than I was expecting.  Added to the mix were many locks, nested, sometimes acquired in one method and released in some distant place. There was simply no way to reason about where potential deadlocks lurked, even if we found and fixed the ones we encountered.

I picked up another pool and inspected its code. The lock semantics were clearer, but the volume of code was still more than 2x what I expected, especially given that it was delegating the core pooling logic to a separate library.

In addition, all of the pools I studied violated JDBC contracts in multiple ways. In as much as it is possible, a pool should return a Connection that is indistinguishable from one received in the absence of the pool. But these pools didn’t automatically close Statements when a connection was “closed” (returned), or clear warnings, or rollback uncommitted transactions, and they didn’t reset properties altered by the user such as auto-commit or transaction isolation level, and more; resulting in the next consumer getting a “dirty” connection.

I thought, “Really? This is the state of connection pools in the ecosystem after 20 years of Java?” Out of necessity and frustration, I created HikariCP.

To be fair, since I started HikariCP some pools have made some of these “correctness” behaviors configurable, but none of them do so by default and I suspect most users are running with the safety off.  At least two popular pools fail to complete our benchmark with OutOfMemory exceptions when they are enabled.  Conversely, HikariCP doesn’t support an unsafe mode of operation.

Returning to your question, as noted above there were many established pools available, so how did HikariCP become popular?  “Correctness” and reliability are a tough sell, so I focused on promoting performance, and started with a simple tweet. One follower led to another.  Some users tweeted about big performance gains, and improved reliability, and at some point in 2015 the Wix engineering team wrote a blog about switching to HikariCP.

In essence, simple word of mouth has led to HikariCP’s rising popularity, with an initial “marketing” push based on performance.  I do hope that over time more users will give equal weight to correctness and reliability, without which performance is meaningless, and for my part I plan to write more about those aspects of HikariCP.  

You quoted Edsger Dijkstra: “Simplicity is prerequisite for reliability.” – That reminds me of Antoine de Saint-Exupery’s “Perfection is Achieved Not When There Is Nothing More to Add, But When There Is Nothing Left to Take Away”. How do you manage to keep things simple when this world only ever gets more complicated?

Resisting complexity through feature-creep can be challenging.  I get a lot of requests for this or that feature, and while each may be simple in and of itself, if taken in totality would significantly increase complexity and code size.  Of course, that is not to say that I don’t add features.

For example, initial versions of HikariCP only supported a fixed size pool.  HikariCP was designed for systems with fairly constant load, and in that environment pools tend to stay at their maximum size, so I saw little need to complicate the code to support dynamic sizing.  Can you imagine a server at Google falling idle for several minutes?  Additionally, I feel like the more axes of configuration there are, the more difficult it is for users to optimally configure a pool.  However, eventually there were enough users who needed dynamic sizing, and its absence was a barrier to adoption, so support was added.  Principally, I did not want lack of dynamic sizing support to deprive users of the reliability and correctness benefits of HikariCP.

Still, I probably reject the vast majority of feature requests. As the custodian of HikariCP keeping it simple and true to that core philosophy is in the best interest of the community.  I always try to minimize the “surface area”, both in terms of code and configuration.  The larger the surface area of an API, the more difficult it is to comprehend.  Our brains have a limit for the amount of contextual information that can be held “in view” at one time; this is true in a lot of contexts.  For example, when reading code, methods larger than a certain size, or conditionals of more than a certain number of terms, are difficult to follow or reason about.  Generally, for users of HikariCP, the “surface area” is manifest in the number of configuration parameters.  While I can hardly say that “Perfection [has been] achieved”, I do feel like there is not much left to take away without cutting into functionality.

Few libraries go to the byte code level to optimise their code. While this helps in benchmarks, did it also help your users in production? What were the biggest caveats you found while micro-optimising?

Definitely.  Maybe some developers are dismissive of the potential gains, because in their minds they think, “What does it matter if connection acquisition takes 100ns or 100μs, the query is going to take 10ms anyway?”  However, pools intercept dozens of methods, and the “close()” path is typically slower than acquisition, so it’s not that simple.  I often get reports from users providing confirmation of real world performance improvements.  It’s anecdotal but one user initially commented in a bug report, “We’re testing HikariCP at the client and have had great initial success – an application loading 1 million records over multiple HTTP threads and putting them in the DB had its run time cut by 70% after moving from Tomcat CP to HikariCP!”  The follow-up comment on the bug was, “This was a bug in our side, using some unrelated non-threadsafe code.  No issue.  After fixing the bug, the code runs about 2x faster using HikariCP than Tomcat CP.”  That’s pretty good; and yet some reports surprise even me.

Regarding optimisation, and as long as we’re quoting famous thinkers, I would be remiss if I didn’t cite Knuth: “We should forget about small efficiencies, say about 97% of the time: premature optimisation is the root of all evil.”  I think the key word here is “premature”.  It is definitely better to write the code as it naturally comes and then, based on detailed profiling and benchmarking, perform “peephole optimisations” (to hijack a word from compiler theory).  At the same time, I would estimate that half of the performance gains in HikariCP have come as the result of algorithmic changes, rather than low-level optimisations.

Regarding caveats to micro-optimising, it would be hard to convey how much I have learned, and am still learning.  I’d like to give a shout-out to Aleksey Shipilëv for his excellent JMH micro-benchmark framework.  Aleksey has become somewhat of a JVM performance oracle (no pun intended, he used to work for Oracle).  The JVM performs an amazing array of optimisations, and if one is not careful then what appears to be a clever optimisation in the code simply confuses the JIT’s pattern-based optimiser and the result is slower rather than faster.

In order to effectively optimise on the JVM you sometimes end up reading the JIT source code, and you must become familiar with concepts such as dead code elimination, loop invariant hoisting, constant propagation, virtual call inlining, and many more.  Even with a good grip on these concepts I am sometimes surprised by the JVM in my attempts at optimisation.  In addition to the JIT, you really must understand the Java Memory Model (JMM) and how it maps onto CPU architectures like x86.

Lastly, after the design of algorithms, contention for shared state is the source of most bottlenecks (see the aforementioned JMM), so recently the biggest gains (for example, in v2.6.0) have come from tricks that simply avoid it; the fastest code is code that is never executed.

If there is a main takeaway, it is “trust the benchmarks”, your assumptions and intuitions are wrong more often than you imagine.

Your fellow jOOQ Tuesdays interviewee Vlad Mihalcea talked to us about queueing theory. How does this compare to what you wrote about connection pool sizing?

I have great respect for Vlad, I think we’re both members of the Mutual Admiration Society.  His FlexyPool is trying to solve a difficult problem; that being how to automatically tune optimal pool settings for varying loads.  Ultimately, the upper-bound is constrained by the database’s optimal concurrent query capacity, which is where my write-up on pool sizing comes into play.  However, there is a large amount of configuration space in-between a minimally sized pool and that upper-bound, which is where FlexyPool is trying to add value, by ensuring that the pool is “right sized”, dynamically, for the load it is servicing.

I say it is a difficult problem, because connection pools on modern multi-core servers likely present as a M/G/k queue in queueing theory; arrivals have a Markovian distribution, service times have a General distribution, and there are k servers (where “server” is defined as an abstract single-threaded processor).  Quoting wikipedia, “Most performance metrics for this queueing system are not known and remain an open problem.”  Modeling connection pools as a M/M/c queue might provide a decent approximation for the purposes of predicting queue lengths, but service times are not likely to have a Markovian distribution.  Of course, there are also non-Markovian stochastic models in queueing theory that could be applied.  Complicating everything is the fact that queued waiters (threads) can abandon the queue before service, for example when a timeout is reached.  That adds an additional twist when trying to predict queue lengths and wait times.  Hats off to Vlad for taking on this problem!

Anyway, what I wrote about setting the upper-bound on pool sizing translates to pinning the k (or c) value in those respective Markovian queueing theory models.

You chose a Japanese word in your product: 光 (Hikari, “Light”). What’s your connection to Japan?

I’ve lived and worked in Tokyo since 2008, though I think my Japanese is far behind where it should be given my time here.  I chalk that up to preferring time at the keyboard to language study.

As you mentioned, Hikari (pronounced Hi-ka-lee) translates to “Light” (as in sunlight).  In English, it is a double entendre in the context of HikariCP; though in Japanese it would not be.  “Light” in the sense of “the speed of…”, and “light” in the sense of being light in terms of code weight.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s