Applying Queueing Theory to Dynamic Connection Pool Sizing with FlexyPool


I’m very happy to have another interesting blog post by Vlad Mihalcea on the jOOQ blog, this time about his Open Source library flexypool. Read his previous jOOQ Tuesdays post on Hibernate here.

Vlad is a Hibernate developer advocate and he’s the author of the popular book High Performance Java Persistence, and he knows 1-2 things about connection pooling.

vladmihalcea

Introduction

Back in 2014, I was working as a software architect, and our team was building a real-estate platform which was composed of multiple nodes, as depicted in the following diagram:

databaseintegrationpoint

This is a classic enterprise architecture layout. The database is replicated to provide better throughout and availability in case of node failures. There are front-end nodes that deliver the website content. There are also many back-end nodes as well, like email schedulers or data import batch processors.

All these nodes require database connectivity, either to a Master node, for read-write transactions or to the Slave nodes, for read-only transactions.

Because acquiring database connections is an expensive process, each system node uses its own connection pool. By reusing physical database connections, the connection acquisition is very fast, therefore reducing the overall transaction response time.

Not only that a connection pool can reduce transaction response time, but it can level up traffic spikes as well. Without a connection pool, during a traffic spike, a front-end nodes might acquire all database connections, leaving the back-end processors with no database connectivity.

The connection pool, having a maximum number of database connections, allows the connections to queue whenever a traffic spike is happening. Therefore, during a traffic spike, the transaction response time will increase due to the queuing mechanism, but this is way better than taking down the whole system.

For these two reasons, the connection pool is a very good choice in many enterprise systems.

Based on the underlying hardware resources, a relational database can only offer a limited number of connections. For this reason, we must be very careful when choosing the pool size for each particular system node.

Connection pool sizing

I was the lucky person to get the task of figuring out how many connections should we allocate for each system node in our real-estate platform. Since I graduated Electronics and Telecommunications, I remembered that we learned about a similar problem when having to provision telecommunications networks. Agner Krarup Erlang invented Queuing theory for solving this problem, and I was curious if we could also find the right pool size by applying Erlang queuing models.

I was not the only one trying to apply the Queuing theory principles to software systems. Percona has a very interesting study: Forecasting MySQL Scalability with the actual service time in a system that is affected by a myriad of variables.

In the end, I realized that the best way to tackle this problem is to constant measuring and adjustments. For this reason, I needed a tool to capture database connection metrics, as well as a way to adjust a given connection pool while the enterprise system is running.

And, that’s how FlexyPool was born.

Basically, FlexyPool is a DataSource Proxy that stands in front of the actual JDBC DataSource or other proxies (e.g. statement logging).

datasourceproxyarchitecture

FlexyPool supports a great variety of stand-alone connection pools:

And it collects the following metrics:

  • concurrent connections histogram
  • concurrent connection requests histogram
  • data source connection acquiring time histogram
  • connection lease time histogram
  • maximum pool size histogram
  • total connection acquiring time histogram
  • overflow pool size histogram
  • retries attempts histogram

For instance, the concurrent connection count metric gives you an insight into how many connections are required by a certain application under a given traffic load:

concurrentconnectioncount

The connection acquisition metric tells you how much time it takes to obtain a database connection from the pool:

connectionacquire

The connection lease time allows you to spot long-running transactions, which are undesirable in high-performance OLTP applications:

connectionlease

For the stand-alone connection pools, FlexyPool can increment the pool size beyond the maximum capacity, as it offers an overflow buffer. The benefit of this overflow buffer is that it allows you to increase the pool size only when the incoming traffic causes a certain connection acquisition timeout.

Although FlexyPool can also monitor Java EE connection pools, it cannot increase the pool size in Java EE environments since the DataSource is an application server managed resource.

Conclusion

Because enterprise systems evolve, so does the underlying data access patterns. For this reasons, monitoring the underlying database connection usage is a very important metric, which needs to be monitored on a regular basis. FlexyPool builds on top of CodaHale and Dropwizard Metrics, so you can easily integrate it with well-known Application Performance Monitoring tools, such as Graphite or Grafana.

FlexyPool is open-source, and it uses an Apache license 2.0. You can find it the project repository on GitHub, and all the released dependencies are available on Maven Central, so it’s very easy to integrate it in your own project.

FkexyPool is powering many enterprise systems, like Etuovi, Mitch&Mates, and ScentBird. If you decide to use it in your current enterprise system, and you are willing to provide a testimonial, you can win a free copy of my High-Performance Java Persistence book.

A Hidden jOOQ Gem: Foreach Loop Over ResultQuery


A recent question on Stack Overflow about jOOQ caught my attention. The question essentially asked:

Why do both of these loops work?

// With fetch()
for (MyTableRecord rec : DSL
    .using(configuration)
    .selectFrom(MY_TABLE)
    .orderBy(MY_TABLE.COLUMN)
    .fetch()) { // fetch() here

    doThingsWithRecord(rec);
}

// Without fetch()
for (MyTableRecord rec : DSL
    .using(configuration)
    .selectFrom(MY_TABLE)
    .orderBy(MY_TABLE.COLUMN)) { // No fetch() here

    doThingsWithRecord(rec);
}

And indeed, just like in PL/SQL, you can use any jOOQ ResultQuery as a Java 5 Iterable, because that’s what it is. An Iterable<R> where R extends Record.

The semantics is simple. When Iterable.iterator() is invoked, the query is executed and the Result.iterator() is returned. So, the result is materialised in the client memory just like if I called fetch(). Unsurprisingly, this is the implementation of AbstractResultQuery.iterator():

@Override
public final Iterator<R> iterator() {
    return fetch().iterator();
}

No magic. But it’s great that this works like PL/SQL:

FOR rec IN (SELECT * FROM my_table ORDER BY my_table.column)
LOOP
  doThingsWithRecord(rec);
END LOOP;

Note, unfortunately, there’s no easy way to manage resources through Iterable, i.e. there’s no AutoCloseableIterable returning an AutoCloseableIterator, which could be used in an auto-closing try-with-resources style loop. This is why the entire result set needs to be fetched at the beginning of the loop. For lazy fetching, you can still use ResultQuery.fetchLazy()

try (Cursor<MyTableRecord> cursor = DSL
    .using(configuration)
    .selectFrom(MY_TABLE)
    .orderBy(MY_TABLE.COLUMN)
    .fetchLazy()) {

    for (MyTableRecord rec : cursor)
        doThingsWithRecord(rec);
}

Happy coding!

Using jOOλ to Combine Several Java 8 Collectors into One


With Java 8 being mainstream now, people start using Streams for everything, even in cases where that’s a bit exaggerated (a.k.a. completely nuts, if you were expecting a hyperbole here). For instance, take mykong’s article here, showing how to collect a Map’s entry set stream into a list of keys and a list of values:

http://www.mkyong.com/java8/java-8-convert-map-to-list

The code posted on mykong.com does it in two steps:

package com.mkyong.example;

import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;

public class ConvertMapToList {
    public static void main(String[] args) {
        Map<Integer, String> map = new HashMap<>();
        map.put(10, "apple");
        map.put(20, "orange");
        map.put(30, "banana");
        map.put(40, "watermelon");
        map.put(50, "dragonfruit");

        System.out.println("\n1. Export Map Key to List...");

        List<Integer> result = map.entrySet().stream()
                .map(x -> x.getKey())
                .collect(Collectors.toList());

        result.forEach(System.out::println);

        System.out.println("\n2. Export Map Value to List...");

        List<String> result2 = map.entrySet().stream()
                .map(x -> x.getValue())
                .collect(Collectors.toList());

        result2.forEach(System.out::println);
    }
}

This is probably not what you should do in your own code. First off, if you’re OK with iterating the map twice, the simplest way to collect a map’s keys and values would be this:

List<Integer> result1 = new ArrayList<>(map.keySet());
List<String> result2 = new ArrayList<>(map.values());

There’s absolutely no need to resort to Java 8 streams for this particular example. The above is about as simple and speedy as it gets.

Don’t shoehorn Java 8 Streams into every problem

But if you really want to use streams, then I would personally prefer a solution where you do it in one go. There’s no need to iterate the Map twice in this particular case. For instance, you could do it by using jOOλ’s Tuple.collectors() method, a method that combines two collectors into a new collector that returns a tuple of the individual collections.

Code speaks for itself more clearly than the above description. Mykong.com’s code could be replaced by this:

Tuple2<List<Integer>, List<String>> result = 
map.entrySet()
    .stream()
    .collect(Tuple.collectors(
        Collectors.mapping(Entry::getKey, Collectors.toList()),
        Collectors.mapping(Entry::getValue, Collectors.toList())
    ));

The only jOOλ code put in place here is the call to Tuple.collectors(), which combines the standard JDK collectors that apply mapping on the Map entries before collecting keys and values into lists.

When printing the above result, you’ll get:

([50, 20, 40, 10, 30], [dragonfruit, orange, watermelon, apple, banana])

i.e. a tuple containing the two resulting lists.

Even simpler, don’t use the Java 8 Stream API, use jOOλ’s Seq (for sequential stream) and write this shorty instead:

Tuple2<List<Integer>, List<String>> result = 
Seq.seq(map)
   .collect(
        Collectors.mapping(Tuple2::v1, Collectors.toList()),
        Collectors.mapping(Tuple2::v2, Collectors.toList())
   );

Where Collectable.collect(Collector, Collector) provides awesome syntax sugar over the previous example

Convinced? Get jOOλ here: https://github.com/jOOQ/jOOL

All Libraries Should Follow a Zero-Dependency Policy


This hilarious article with a click-bait title caught my attention, recently:

View story at Medium.com

A hilarious (although not so true or serious) rant about the current state of JavaScript development in the node ecosystem.

Dependency hell isn’t new

Dependency hell is a term that made it into wikipedia. It defines it as such:

Dependency hell is a colloquial term for the frustration of some software users who have installed software packages which have dependencies on specific versions of other software packages.

The big problem in dependency hell is the fact that small libraries pull in additional depedencies on which they rely in order to avoid too much code duplication. For instance, check out who is using Guava 18.0:
https://mvnrepository.com/artifact/com.google.guava/guava/18.0/usages

You’ll find libraries like:

  • com.fasterxml.jackson.core » jackson-databind
  • org.springframework » spring-context-support
  • org.reflections » reflections
  • org.joda » joda-convert
  • … 2000 more

Now, let’s assume you’re still using Guava 15.0 in your application. You want to upgrade Spring. Will your application still work? Is that upgrade binary compatible? Will Spring guarantee this to you, or are you on your own? Now what if Spring also uses Joda Time, which in turn also uses Guava? Does this even work? Conversely, can Guava ever depend on Joda Time or will a circular Maven Central dependency cause all singularities to collapse into one huge black hole?

Truth is: You don’t need the dependency

… and by you, I don’t mean the end user who writes complex enterprise applications with Guava (or whatever). You need it. But YOU, dear library developer. You certainly don’t need any dependency.

An example from jOOQ. Being a SQL string manipulation library, we pulled in a dependency on Apache Commons Lang because:

  • They have some nice StringUtils, which we like to use
  • Their license is also ASL 2.0, which is compatible to jOOQ’s license

But instead of hard wiring a jOOQ 3.x to commons-lang 2.x dependency, we opted for internalising one of their classes and only parts of it, repackaging it as org.jooq.tools.StringUtils. Essentially, we needed things like:

  • abbreviate()
  • isEmpty()
  • isBlank()
  • leftPad() (hello node developers)

… and some more. That certainly doesn’t justify pulling in the entire dependency, does it? Because while it wouldn’t matter to us, it would matter to our thousands of users, who might prefer to use an older or newer version of commons-lang. And that’s just the beginning. What if commons-lang had transitive dependencies? Etc.

Please, library developers, avoid dependencies

So, please, dear library developers. Please avoid adding dependencies to your libraries. The only things you should depend on are:

  • The JDK
  • Some API governed by a JSR (e.g. JPA)

That’s it. We can all start writing better software and stop downloading the whole internet if YOU the library developers start being reasonable and stop being lazy.

Exceptions to the above rules:

  • Framework and platform vendors (e.g. Spring, Java EE) are excluded. They define the whole platform, i.e. they impose a set of well-documented dependencies. If you’re using a framework / platform in your application, then you have to abide to the platform’s rules

That’s all. Small libraries like jOOQ must not have any dependency.

The Java JIT Compiler is Darn Good at Optimization


“Challenge accepted” said Tagir Valeev when I recently asked the readers of the jOOQ blog to show if the Java JIT (Just-In-Time compilation) can optimise away a for loop.

Tagir is the author of StreamEx, very useful Java 8 Stream extension library that adds additional parallelism features on top of standard streams. He’s a speaker at conferences, and has contributed a dozen of patches into OpenJDK Stream API (including bug fixes, performance optimizations and new features). He’s interested in static code analysis and works on a new Java bytecode analyzer.

I’m very happy to publish Tagir’s guest post here on the jOOQ blog.

tagir-valeev

The Java JIT Compiler

In recent article Lukas wondered whether JIT could optimize a code like this to remove an unnecessary iteration:

// ... than this, where we "know" the list
// only contains one value
for (Object object : Collections.singletonList("abc")) {
    doSomethingWith(object);
}

Here’s my answer: JIT can do even better. Let’s consider this simple method which calculates total length of all the strings of supplied list:

static int testIterator(List<String> list) {
    int sum = 0;
    for (String s : list) {
        sum += s.length();
    }
    return sum;
}

As you might know this code is equivalent to the following:

static int testIterator(List<String> list) {
    int sum = 0;
    Iterator<String> it = list.iterator();
    while(it.hasNext()) {
        String s = it.next();
        sum += s.length();
    }
    return sum;
}

Of course in general case the list could be anything, so when creating an iterator, calling hasNext and next methods JIT must emit honest virtual calls which is not very fast. However what will happen if you always supply the singletonList here? Let’s create some simple test:

public class Test {
    static int res = 0;

    public static void main(String[] args) {
        for (int i = 0; i < 100000; i++) {
            res += testIterator(Collections.singletonList("x"));
        }
        System.out.println(res);
    }
}

We are calling our testIterator in a loop so it’s called enough times to be JIT-compiled with C2 JIT compiler. As you might know, in HotSpot JVM there are two JIT-compilers, namely C1 (client) compiler and C2 (server) compiler. In 64-bit Java 8 they work together. First method is compiled with C1 and special instructions are added to gather some statistics (which is called profiling). Among it there is type statistics. JVM will carefully check which exact types our list variable has. And in our case it will discover that in 100% of cases it’s singleton list and nothing else. When method is called quite often, it gets recompiled by better C2 compiler which can use this information. Thus when C2 compiles it can assume that in future singleton list will also appear quite often.

You may ask JIT compiler to output the assembly generated for methods. To do this you should install hsdis on your system. After that you may use convenient tools like JITWatch or write a JMH benchmark and use -perfasm option. Here we will not use third-party tools and simply launch the JVM with the following command line options:

$ java -XX:+UnlockDiagnosticVMOptions -XX:+PrintCompilation -XX:+PrintAssembly Test >output.txt

This will generate quite huge output which may scare the children. The assembly generated by C2 compiler for ourtestIterator method looks like this (on Intel x64 platform):

  # {method} {0x0000000055120518} 
  # 'testIterator' '(Ljava/util/List;)I' in 'Test'
  # parm0:    rdx:rdx   = 'java/util/List'
  #           [sp+0x20]  (sp of caller)
  0x00000000028e7560: mov    %eax,-0x6000(%rsp)
  0x00000000028e7567: push   %rbp

  ;*synchronization entry
  ; - Test::testIterator@-1 (line 15)
  0x00000000028e7568: sub    $0x10,%rsp         
                                                
  ; implicit exception: dispatches to 0x00000000028e75bd
  0x00000000028e756c: mov    0x8(%rdx),%r10d    

  ;   {metadata('java/util/Collections$SingletonList')}
  0x00000000028e7570: cmp    $0x14d66a20,%r10d  

  ;*synchronization entry
  ; - java.util.Collections::singletonIterator@-1
  ; - java.util.Collections$SingletonList::iterator@4
  ; - Test::testIterator@3 (line 16)
  0x00000000028e7577: jne    0x00000000028e75a0 

  ;*getfield element
  ; - java.util.Collections$SingletonList::iterator@1
  ; - Test::testIterator@3 (line 16)
  0x00000000028e7579: mov    0x10(%rdx),%ebp    

  ; implicit exception: dispatches to 0x00000000028e75c9
  0x00000000028e757c: mov    0x8(%rbp),%r11d    

  ;   {metadata('java/lang/String')}
  0x00000000028e7580: cmp    $0x14d216d0,%r11d  
  0x00000000028e7587: jne    0x00000000028e75b1

  ;*checkcast
  ; - Test::testIterator@24 (line 16)
  0x00000000028e7589: mov    %rbp,%r10          
                                                
  ;*getfield value
  ; - java.lang.String::length@1
  ; - Test::testIterator@30 (line 17)
  0x00000000028e758c: mov    0xc(%r10),%r10d    

  ;*synchronization entry
  ; - Test::testIterator@-1 (line 15)
  ; implicit exception: dispatches to 0x00000000028e75d5
  0x00000000028e7590: mov    0xc(%r10),%eax     
                                                
                                               
  0x00000000028e7594: add    $0x10,%rsp
  0x00000000028e7598: pop    %rbp

  # 0x0000000000130000
  0x00000000028e7599: test   %eax,-0x27b759f(%rip)        
         
  ;   {poll_return}                                       
  0x00000000028e759f: retq   
  ... // slow paths follow

What you can notice is that it’s surpisingly short. I’ll took the liberty to annotate what happens here:

// Standard stack frame: every method has such prolog
mov    %eax,-0x6000(%rsp)
push   %rbp
sub    $0x10,%rsp         
// Load class identificator from list argument (which is stored in rdx 
// register) like list.getClass() This also does implicit null-check: if 
// null is supplied, CPU will trigger a hardware exception. The exception
// will be caught by JVM and translated into NullPointerException
mov    0x8(%rdx),%r10d
// Compare list.getClass() with class ID of Collections$SingletonList class 
// which is constant and known to JIT
cmp    $0x14d66a20,%r10d
// If list is not singleton list, jump out to the slow path
jne    0x00000000028e75a0
// Read Collections$SingletonList.element private field into rbp register
mov    0x10(%rdx),%ebp
// Read its class identificator and check whether it's actually String
mov    0x8(%rbp),%r11d
cmp    $0x14d216d0,%r11d
// Jump out to the exceptional path if not (this will create and throw
// ClassCastException)
jne    0x00000000028e75b1
// Read private field String.value into r10 which is char[] array containing
//  String content
mov    %rbp,%r10
mov    0xc(%r10),%r10d
// Read the array length field into eax register (by default method returns
// its value via eax/rax)
mov    0xc(%r10),%eax
// Standard method epilog
add    $0x10,%rsp
pop    %rbp
// Safe-point check (so JVM can take the control if necessary, for example,
// to perform garbage collection)
test   %eax,-0x27b759f(%rip)
// Return
retq   

If it’s still hard to understand, let’s rewrite it via pseudo-code:

if (list.class != Collections$SingletonList) {
  goto SLOW_PATH;
}
str = ((Collections$SingletonList)list).element;
if (str.class != String) {
  goto EXCEPTIONAL_PATH;
}
return ((String)str).value.length;

So for the hot path we have no iterator allocated and no loop, just several dereferences and two quick checks (which are always false, so CPU branch predictor will predict them nicely). Iterator object is evaporated completely, though originally it has additional bookkeeping like tracking whether it was already called and throwing NoSuchElementException in this case. JIT-compiler statically proved that these parts of code are unnecessary and removed them. The sum variable is also evaporated. Nevertheless the method is correct: if it happens in future that it will be called with something different from singleton list, it will handle this situation on the SLOW_PATH (which is of course much longer). Other cases like list == null or list element is not String are also handled.

What will occur if your program pattern changes? Imagine that at some point you are no longer using singleton lists and pass different list implementations here. When JIT discovers that SLOW_PATH is hit too often, it will recompile the method to remove special handling of singleton list. This is different from pre-compiled applications: JIT can change your code following the behavioral changes of your program.

“What Java ORM do You Prefer, and Why?” – SQL of Course!


Catchy headline, yes. But check out this Stack Overflow question by user Mike:

(I’m duplicating it here on the blog, as it might be deleted soon)

It’s a pretty open ended question. I’ll be starting out a new project and am looking at different ORMs to integrate with database access.

Do you have any favorites? Are there any you would advise staying clear of?

And the top voted answer (164 points by David Crawshaw is: “Just use SQL”:

I have stopped using ORMs.

The reason is not any great flaw in the concept. Hibernate works well. Instead, I have found that queries have low overhead and I can fit lots of complex logic into large SQL queries, and shift a lot of my processing into the database.

So consider just using the JDBC package.

The second answer (66 points by user simon) is, again: “Just use SQL”:

None, because having an ORM takes too much control away with small benefits. The time savings gained are easily blown away when you have to debug abnormalities resulting from the use of the ORM. Furthermore, ORMs discourage developers from learning SQL and how relational databases work and using this for their benefit.

The third answer (51 points by myself) is saying, once more: “Use SQL” (and use it with jOOQ).

The best way to write SQL in Java

Only the fourth answer (46 points by Abdullah Jibaly) mentiones Hibernate, the most popular ORM in the Java ecosystem.

The truth is, as we’ve shown numerous times on this blog: Hibernate/JPA/ORMs are good tools to get rid of boring (and complex) CRUD. But that’s just boilerplate logic with little value to your business logic. The interesting stuff – the queries, the batch and bulk processing, the analytics, the reporting, they’re all best done with SQL. Here are some additional articles:

Stay tuned as we’re entering an era of programming where object orientation fades, and functional / declarative programming makes data processing extremely easy and lean again.

How Functional Programming will (Finally) do Away With the GoF Patterns


A recent article about various ways to implement structural pattern matching in Java has triggered my interest:
http://blog.higher-order.com/blog/2009/08/21/structural-pattern-matching-in-java

The article mentions a Scala example where a tree data structure can be traversed very easily and neatly using Scala’s match keyword, along with using algebraic data types (more specifically, a sum type):

def depth(t: Tree): Int = t match {
  case Empty => 0
  case Leaf(n) => 1
  case Node(l, r) => 1 + max(depth(l), depth(r))
}

Even if you’re not used to the syntax, it is relatively easy to understand what it does:

  • There’s a function depth that calculates the (maximum) depth of a tree structure
  • It does so by checking if the input argument is empty, a leaf node, or any other node
  • If it is any other node, then it adds 1 to the maximum of the remaining tree, recursively

The elegant thing here is that the Scala type system helps the author of the above code get this right from a formal point of view, by offering formal type checking. The closest we can do in Java as illustrated by the article is this

public static int depth(Tree t) {
  if (t instanceof Empty)
    return 0;
  if (t instanceof Leaf)
    return 1;
  if (t instanceof Node)
    return 1 + max(depth(((Node) t).left), depth(((Node) t).right));
  throw new RuntimeException("Inexhaustive pattern match on Tree.");
}

But these instanceof checks do smell kind of fishy…

For more details, read the full article here, highly recommended:
http://blog.higher-order.com/blog/2009/08/21/structural-pattern-matching-in-java

How does this compare to the GoF design patterns?

In our object-orientation-brainwashed Java ecosystem (which inherited the OO brainwash from C++), the above instanceof logic would most likely be refactored into an implementation using the visitor pattern from the GoF design patterns book. This refactoring would be done by The Team Architect™ himself, as they are supervising the object oriented quality of your software. The 7 lines of code using instanceof would quickly bloat up to roughly 200 lines of weird interfaces, abstract classes, and cryptic accept() and visit() methods. When in fact, the functional programming approach was so much leaner, even in its imperfect Java instanceof form!

A lot of the GoF design patterns stem from a time when EVERYTHING needed to be an object. Object orientation was the new holy grail, and people even wanted to push objects down into databases. Object databases were invented (luckily, they’re all dead) and the SQL standard was enhanced with ORDBMS features (only really implemented in Oracle, PostgreSQL, and Informix, and maybe some other minor DBs), most of which – also luckily – were never widely adopted.

Since Java 8, finally, we’re starting to recover from the damage that was made in early days of object orientation in the 90s, and we can move back to a more data-centric, functional, immutable programming model where data processing languages like SQL are appreciated rather than avoided, and Java will see more and more of these patterns, hopefully.

If you’re not convinced by the above visitor pattern vs pattern matching example, do read this very interesting series of articles by Mario Fusco:

You will see that with functional programming, many patterns lose their meaning as you’re just starting to pass around functions, making code very simple and easy to understand.

As a wrap up, as Mario presented the content at Voxxed Days Ticino:

Happy functional programming!