Squeezing Another 10% Speed Increase out of jOOQ using JMC and JMH

In this post, we’re going to discuss a couple of recent efforts to squeeze roughly 10% in terms of speed out of jOOQ by iterating on hotspots that were detected using JMC (Java Mission Control) and then validated using JMH (Java Microbenchmark Harness). This post shows how to apply micro optimisations to algorithms where the smallest improvement can have a significant effect.

While JMH is probably without competition, JMC could easily be replaced by JProfiler, YourKit, or even your own manual jstack sampling. I’ll just use JMC because it ships with the JDK and is free for use for development as of JDK 8 and 9 (if you’re unsure whether you’re “developing”, better ask Oracle). Rumours have it that JMC might be contributed to the OpenJDK in the near future.

Micro optimisations

Micro optimisations are a cool technique to squeeze a very small improvement out of a local algorithm (e.g. a loop) that has a significant effect on the entire application / library, because of the fact that the local algorithm is called many times. This is absolutely the case in jOOQ, which is essentially a library that always runs 4 nested loops:

  1. S: A “loop” over all possible SQL statements
  2. E: A “loop” over all executions of such a statement
  3. R: A loop over all rows in the result
  4. C: A loop over all columns in a row

Such four level nested loops result in what we could call a polynomial complexity of our algorithms, even if we cannot call the complexity O(N4) (as the 4 “N” are not all the same), it is certainly of O(S x E x R x C) (I’ll call this “S-E-R-C loops” further down). Even to the untrained eye, it becomes evident that anything that happens in the inner-most “C-loop” can have devastating effects. We better not be opening any files here, that could be opened outside of, e.g. the “S-loop”

In a previous blog post, we’ve discussed common techniques of optimising such situations. In this blog post, we’ll look into a couple of concrete examples.

How to discover flaws in these loops?

We’re looking for the problems that affect all users, the kind of problem that, once fixed, will improve jOOQ’s performance for everyone by e.g. 10%. This is similar to what the JIT does, by performing things like stack allocation, inlining, which don’t drastically improve things locally, but do so globally, and for everyone. Here’s an interesting guest post by Tagir Valeev on JIT optimisation, and how good it is.

Getting a large “S-loop”

The first option is to run profiling sessions on benchmarks. We could, for example, run the entire “S-E-R-C loops” in a JMC profiling session, where the “S-loop” is a loop over all our statements, or in other words, over all our integration tests. Unfortunately, with this approach, our “E-loop” (in the case of jOOQ’s integration tests) is a single execution per statement. We’d have to run the integration tests many, many times in order to get meaningful results.

Also, while the jOOQ integration tests run thousands of distinct queries, most queries are still rather simple, each one focusing on an individual SQL feature (e.g. lateral join). In a end user application, queries might use less specific features, but are much more complex, i.e. they have a lot of ordinary joins.

This technique is useful to find problems that appear in all queries, deep down inside of jOOQ – e.g. at the JDBC interface. But we cannot use this approach to test individual features.

Getting a large “E-loop”

Another option is to write a single test that runs a few statements (small “S-loop”) many times in an explicit loop (large “E-loop”). This has the advantage that a specific bottleneck can be found with a high confidence, but the drawback is: It’s specific. For instance, if we find a small bottleneck in the string concatenation function, well, that is certainly worth fixing, but doesn’t affect most users.

This approach is useful to test individual features. It can also be useful for finding issues that affect all queries, but with a lower confidence than the previous case, where the “S-loop” is maximised.

Getting large “R-loops” and “C-loops”

Creating large result sets is easy and should definitely be part of such benchmarks, because in the case of a large result set, any flaw will multiply drastically, so fixing these things is worthwhile. However, these problems only affect actual result sets, not the query building process or the execution process. Sure, most statements are probably queries, not insertions / updates, etc. But this needs to be kept in mind.

Optimising for problems in large “E-loops”

All of the above scenarios are different optimisation sessions and deserve their own blog posts. In this post, I’m describing what has been discovered and fixed when running a single query 3 million times on an H2 database. The H2 database is chosen here, because it can run in memory of the same process and thus has the least extra overhead compared to jOOQ – so jOOQ’s overhead contributions become significant in a profiling session / benchmark. In fact, it can be shown that in such a benchmark, jOOQ (or Hibernate, etc.) appears to perform quite poorly compared to a JDBC only solution, as many have done before.

This is an important moment to remind ourselves:

Benchmarks do not reflect real-world use cases! You will never run the exact same query 3 million times on a production system, and your production system doesn’t run on H2.

A benchmark profits from so much caching, buffering, you would never perform as fast as in a benchmark.

Always be careful not to draw any wrong conclusions from a benchmark!

This needs to be said, so take every benchmark you find on the web with a grain of salt. This includes our own!

The query being profiled is:

ctx.select(
      AUTHOR.FIRST_NAME,
      AUTHOR.LAST_NAME,
      BOOK.ID,
      BOOK.TITLE)
   .from(BOOK)
   .join(AUTHOR).on(BOOK.AUTHOR_ID.eq(AUTHOR.ID))
   .where(BOOK.ID.eq(1))
   .and(BOOK.TITLE.isNull().or(BOOK.TITLE.ne(randomValue)));

The trivial query returns a ridiculous 4 rows and 4 columns, so the “R-loop” and “C-loops” are negligible. This benchmark is really testing the overhead of jOOQ query execution in a case where the database does not contribute much to the execution time. Again, in a real world scenario, you will get much more overhead from your database.

In the following sections, I’ll show a few minor bottlenecks that could be found when drilling down into these such execution scenarios. As I’ve switched between JMC versions, the screenshots will not always be the same, I’m afraid.

1. Instance allocation of constant values

A very silly mistake was easily discovered right away:

The mistake didn’t contribute a whole lot of overhead, only 1.1% to the sampled time spent, but it made me curious. In version 3.10 of jOOQ, the SelectQueryImpl‘s Limit class, which encodes the jOOQ OFFSET / LIMIT behaviour kept allocating this DSL.val() thingy, which is a bind variable. Sure, limits do work with bind variables, but this happened when SelectQueryImpl was initialised, not when the LIMIT clause is added by the jOOQ API user.

As can be seen in the sources, the following logic was there:

private static final Field<Integer> ZERO              = zero();
private static final Field<Integer> ONE               = one();
private Field<Integer>              numberOfRowsOrMax = 
    DSL.inline(Integer.MAX_VALUE);

While the “special limits” ZERO and ONE were static members, the numberOfRowsOrMax value wasn’t. That’s the instantiation we were measuring in JMC. The member is not a constant, but the default value is. It is always initialised with Integer.MAX_VALUE wrapped in an DSL.inline() call. The solution is really simple:

private static final Param<Integer> MAX               = 
    DSL.inline(Integer.MAX_VALUE);
private Field<Integer>              numberOfRowsOrMax = MAX;

This is obviously better! Not only does it avoid the allocation of the bind variable, it also avoids the boxing of Integer.MAX_VALUE (which can also be seen in the sampling screenshot).

Note, a similar optimisation is available in the JDK’s ArrayList. When you look at the sources, you’ll see:

/**
 * Shared empty array instance used for empty instances.
 */
private static final Object[] EMPTY_ELEMENTDATA = {};

When you initialise an ArrayList without initial capacity, it will reference this shared instance, instead of creating a new, empty (or even non-empty) array. This delays the allocation of such an array until we actually add things to the ArrayList, just in case it stays empty.

jOOQ’s LIMIT is the same. Most queries might not have a LIMIT, so better not allocate that MAX_VALUE afresh!

This is done once per “E-loop” iteration

One issue down: https://github.com/jOOQ/jOOQ/issues/6635

2. Copying lists in internals

This is really a micro optimisation that you probably shouldn’t do in ordinary business logic. But it might be worthwhile in infrastructure logic, e.g. when you’re also in an “S-E-R-C loop”:

jOOQ (unfortunately) occasionally copies data around between arrays, e.g. wrapping Strings in jOOQ wrapper types, transforming numbers to strings, etc. These loops aren’t bad per se, but remember, we’re inside some level of the “S-E-R-C loop”, so these copying operations might be run hundreds of millions of times when we run a statement 3 million times.

The above loop didn’t contribute a lot of overhead, and possible the cloned object was stack allocated or the clone call eliminated by the JIT. But maybe it wasn’t. The QualifiedName class cloned its argument prior to returning it to make sure that no accidental modifications will have any side effect:

private static final String[] nonEmpty(String[] qualifiedName) {
    String[] result;
    ...
    if (nulls > 0) {
        result = new String[qualifiedName.length - nulls];
        ...
    }
    else {
        result = qualifiedName.clone();
    }
    return result;
}

So, the implementation of the method guaranteed a new array as a result.

After a bit of analysis, it could be seen that there is only a single consumer of this method, and it doesn’t leave that consumer. So, it’s safe to remove the clone call. Probably, the utility was refactored from a more general purpose method into this local usage.

This is done several times per “E-loop” iteration

One more issue down: https://github.com/jOOQ/jOOQ/issues/6640

3. Running checks in loops

This one is too silly to be true:

There’s a costly overhead in the CombinedCondition constructor (<init> method). Notice, how the samples drop from 0.47% to 0.32% between the constructor and the next method init(), that’s the time spent inside the constructor.

A tiny amount of time, but this time is spent every time someone combines two conditions / predicates with AND and OR. Every time. We can probably save this time. The problem is this:

CombinedCondition(Operator operator, Collection<? extends Condition> conditions) {
    ...
    for (Condition condition : conditions)
        if (condition == null)
            throw new IllegalArgumentException("The argument 'conditions' must not contain null");

    ...
    init(operator, conditions);
}

There’s a loop over the arguments to give some meaningful error messages. That’s a bit too defensive, I suspect. How about we simply live with the NPE when it arises, as this should be rather unexpected (for the context, jOOQ hardly ever checks on parameters like this, so this should also be removed for consistency reasons).

This is done several times per “E-loop” iteration

One more issue down: https://github.com/jOOQ/jOOQ/issues/6666 (nice number)

4. Lazy initialisation of lists

The nature of the JDBC API forces us to work with ThreadLocal variables, very unfortunately, as it is not possible to pass arguments from parent SQLData objects to children, especially when we combine nesting of Oracle TABLE/VARRAY and OBJECT types.

In this analysis, we’re combining the profiler’s CPU sampling with its memory sampling:

In the CPU sampling view above, we can see some overhead in the DefaultExecuteContext, which is instantiated once per “E-loop” iteration. Again, not a huge overhead, but let’s look at what this constructor does. It contributes to the overall allocations of ArrayList:

When we select the type in JMC, the other view will then display all the stack traces where ArrayList instances were allocated, among which, again, our dear DefaultExecuteContext constructor:

Where are those ArrayLists allocated? Right here:

BLOBS.set(new ArrayList<Blob>());
CLOBS.set(new ArrayList<Clob>());
SQLXMLS.set(new ArrayList<SQLXML>());
ARRAYS.set(new ArrayList<Array>());

Every time we start executing a query, we initialise a list for each ones of these types. All of our variable binding logic will then register any possibly allocated BLOB or CLOB, etc. such that we can clean these up at the end of the execution (a JDBC 4.0 feature that not everyone knows of!):

static final void register(Blob blob) {
    BLOBS.get().add(blob);
}
    
static final void clean() {
    List<Blob> blobs = BLOBS.get();

    if (blobs != null) {
        for (Blob blob : blobs)
            JDBCUtils.safeFree(blob);

        BLOBS.remove();
    }
    ...
}

Don’t forget calling Blob.free() et al, if you’re working with JDBC directly!

But the truth is, in most cases, we don’t really need these things. We need them only in Oracle, and only if we’re using TABLE / VARRAY or OBJECT types, due to some JDBC restrictions. Why punish all the users of other databases with this overhead? Instead of a sophisticated refactoring, which risks introducing regressions (https://github.com/jOOQ/jOOQ/issues/4205), we can simply initialise these lists lazily. We leave the clean() method as it is, remove the initialisation in the constructor, and replace the register() logic by this:

static final void register(Blob blob) {
    List<Blob> list = BLOBS.get();

    if (list == null) {
        list = new ArrayList<Blob>();
        BLOBS.set(list);
    }

    list.add(blob);
}

That was easy. And significant. Check out the new allocation measurements:

Note that every allocation, apart from the overhead of allocating things, also incurs additional overhead when the object is garbage collected. That’s a bit trickier to measure and correlate. In general, less allocations is almost always a good thing, except if the allocation is super short lived, in case of which stack allocation can happen, or the logic can even be eliminated by the JIT.

This is done several times per “E-loop” iteration

One more issue down: https://github.com/jOOQ/jOOQ/issues/6669

6. Using String.replace()

This is mostly a problem in JDK 8 only, JDK 9 fixed string replacing by no longer relying on regular expressions internally. In JDK 8, however (and jOOQ still supports Java 6, so this is relevant), string replacement works through regular expressions as can be seen here:

The Pattern implementation allocates quite a few int[] instances, even if that’s probably not strictly needed for non-regex patterns as those of String.replace():

I’ve already analysed this in a previous blog post, which can be seen here:

https://blog.jooq.org/2017/10/11/benchmarking-jdk-string-replace-vs-apache-commons-stringutils-replace/

This is done several times per “E-loop” iteration

One more issue down: https://github.com/jOOQ/jOOQ/issues/6672

7. Registering an SPI that is going to be inactive

This one was a bit more tricky to solve as it relies on a deeper analysis. Unfortunately, I have no profiling screenshots available anymore, but it is easy to explain with code. There’s an internal ExecuteListeners utility, which abstracts over the ExecuteListener SPIs. Users can register such a listener and listen to query rendering, variable binding, query execution, and other lifecycle events. By default, there is no such ExecuteListener by the users, but there’s always one internal ExecuteListener:

private static ExecuteListener[] listeners(ExecuteContext ctx) {
    List<ExecuteListener> result = new ArrayList<ExecuteListener>();

    for (ExecuteListenerProvider provider : ctx.configuration()
                                               .executeListenerProviders())
        if (provider != null)
            result.add(provider.provide());

    if (!FALSE.equals(ctx.settings().isExecuteLogging()))
        result.add(new LoggerListener());

    return result.toArray(EMPTY_EXECUTE_LISTENER);
}

The LoggerListener is added by default, unless users turn off that feature. Which means:

  • We’ll pretty much always get this ArrayList
  • We’ll pretty much always loop over this list
  • We’ll pretty much always clal this LoggerListener

But what does it do? It logs stuff on DEBUG and TRACE level. For instance:

@Override
public void executeEnd(ExecuteContext ctx) {
    if (ctx.rows() >= 0)
        if (log.isDebugEnabled())
            log.debug("Affected row(s)", ctx.rows());
}

That’s what it does by definition. It’s a debug logger. So, the improved logic for initialising this thing is the following:

private static final ExecuteListener[] listeners(ExecuteContext ctx) {
    List<ExecuteListener> result = null;

    for (ExecuteListenerProvider provider : ctx.configuration()
                                               .executeListenerProviders())
        if (provider != null)
            (result = init(result)).add(provider.provide());

    if (!FALSE.equals(ctx.settings().isExecuteLogging())) {
        if (LOGGER_LISTENER_LOGGER.isDebugEnabled())
            (result = init(result)).add(new LoggerListener());
    }

    return result == null ? null : result.toArray(EMPTY_EXECUTE_LISTENER);
}

We’re no longer allocating the ArrayList (that might be premature, the JIT might have rewritten this allocation to not happen, but OK), and we’re only adding the LoggerListener if it DEBUG or TRACE logging is enabled for it, i.e. if it would do any work at all.

That’s just a couple of CPU cycles we can save on every execution. Again, I don’t have the profiling measurements anymore, but trust me. It helped.

This is done several times per “E-loop” iteration

One more issue down: https://github.com/jOOQ/jOOQ/issues/6747

8. Eager allocation where lazy allocation works

Sometimes, we need two different representations of the same information. The “raw” representation, and a more useful, pre-processed representation for some purposes. This was done, for instance, in QualifiedField:

private final Name          name;
private final Table<Record> table;

QualifiedField(Name name, DataType<T> type) {
    super(name, type);

    this.name = name;
    this.table = name.qualified()
        ? DSL.table(name.qualifier())
        : null;
}

@Override
public final void accept(Context<?> ctx) {
    ctx.visit(name);
}

@Override
public final Table<Record> getTable() {
    return table;
}

As can be seen, the name is really the beef of this class. It’s a qualified name that generates itself on the SQL string. The Table representation is useful when navigating the meta model, but this is hardly ever done by jOOQ’s internals and/or user facing code.

However, this eager initialisation it is costly:

Quite a few UnqualifiedName[] arrays are allocated by the call to Name.qualifier(). We can easily make that table reference non-final and calculate it lazily:

private final Name              name;
private Table<Record>           table;

QualifiedField(Name name, DataType<T> type) {
    super(name, type);

    this.name = name;
}

@Override
public final Table<Record> getTable() {
    if (table == null)
        table = name.qualified() ? DSL.table(name.qualifier()) : null;

    return table;
}

Because name is final, we could call table “effectively final” (in a different meaning than the Java language’s) – we won’t have any thread safety issues because these particular types are immutable inside of jOOQ.

This is done several times per “E-loop” iteration

One more issue down: https://github.com/jOOQ/jOOQ/issues/6755

Results

Now, thus far, we’ve “improved” many low hanging fruit based on a profiler session (that was run, akhem, from outside of Eclipse on a rather busy machine). This wasn’t very scientific. Just tracking down “bottlenecks” which triggered my interest by having high enough numbers to even notice. This is called “micro optimisation”, and it is only worth the trouble if you’re in a “S-E-R-C loop”, meaning that the code you’re optimising is executed many many times. For me, developing jOOQ, this is almost always the case, because jOOQ is a library used by a lot of people who all profit from these optimisations. In many other cases, this might be called “premature optimisation”

But once we’ve optimised, we shouldn’t stop. I’ve done a couple of individual JMH benchmarks for many of the above problems, to see if they were really an improvement. But sometimes, in a JMH benchmark, something that doesn’t look like an improvement will still be an improvement in the bigger picture. The JVM doesn’t inline all methods 100 levels deep. If your algorithm is complex, perhaps a micro optimisation will still have an effect that would not have any effect on a JMH benchmark.

Unfortunately this isn’t very exact science, but with enough intuition, you’ll find the right spots to optimise.

In my case, I verified progress over two patch releases: 3.10.0 -> 3.10.1 -> 3.10.2 (not yet released) by running a JMH benchmark over the entire query execution (including H2’s part). The results of applying roughly 15 of the above and similar optimisations (~2 days’ worth of effort) is:

JDK 9 (9+181)

jOOQ 3.10.0 Open Source Edition

Benchmark                          Mode   Cnt       Score      Error  Units
ExecutionBenchmark.testExecution   thrpt   21  101891.108 ± 7283.832  ops/s

jOOQ 3.10.2 Open Source Edition

Benchmark                          Mode   Cnt       Score      Error  Units
ExecutionBenchmark.testExecution   thrpt   21  110982.940 ± 2374.504  ops/s

JDK 8 (1.8.0_145)

jOOQ 3.10.0 Open Source Edition

Benchmark                          Mode   Cnt       Score      Error  Units
ExecutionBenchmark.testExecution   thrpt   21  110178.873 ± 2134.894  ops/s

jOOQ 3.10.2 Open Source Edition

Benchmark                          Mode   Cnt       Score      Error  Units
ExecutionBenchmark.testExecution   thrpt   21  118795.922 ± 2661.653  ops/s

As can be seen, in both JDK versions, we’ve gotten roughly a 10% speed increase. What’s interesting is also that JDK 8 seemed to have been also 10% faster than JDK 9 in this benchmark, although this can be due to a variety of things that I haven’t considered yet, and which are out of scope for this discussion.

Conclusion

This iterative approach to tackling performance is definitely worth it for library authors:

  • run a representative benchmark (repeat a task millions of times)
  • profile it
  • track down “bottlenecks”
  • if they’re easy to fix without regression risk, do it
  • repeat
  • after a while, verify with JMH

Individual improvements are quite hard to measure, or measure correctly. But when you do 10-15 of them, they start adding up and become significant. 10% can make a difference.

Looking forward to your comments, alternative techniques, alternative tools, etc.!

If you liked this article, you will also like Top 10 Easy Performance Optimisations in Java

A Basic Programming Pattern: Filter First, Map Later

In recent days, I’ve seen a bit too much of this:

someCollection
    .stream()
    .map(e -> someFunction(e))
    .collect(Collectors.toList())
    .subList(0, 2);

Something is very wrong with the above example. Can you see it? No? Let me rename those variables for you.

hugeCollection
    .stream()
    .map(e -> superExpensiveMapping(e))
    .collect(Collectors.toList())
    .subList(0, 2);

Better now? Exactly. The above algorithm is O(N) when it could be O(1):

hugeCollection
    .stream()
    .limit(2)
    .map(e -> superExpensiveMapping(e))
    .collect(Collectors.toList());

(Let’s assume the lack of explicit ordering is irrelevant)

I’m working mostly with SQL and helping companies tune their SQL (check out our training, btw) and as such, I’m always very eager to reduce the algorithmic complexity of queries. To some people, this seems like wizardry, but honestly, most of the time, it just boils down to adding a well placed index.

Why?

Because an index reduces the algorithmic complexity of a query algorithm from O(N) (scanning the entire table for matches) to O(log N) (scanning a B-tree index for the same matches).

Sidenote: Reducing algorithmic complexity is almost never premature optimisation. As your data set grows, bad complexity will always bite you!

The same is true for the above examples. Why would you ever traverse and transform an entire list of a huge amount of elements (N) with an expensive operation (superExpensiveMapping), when you really only need to do this only for the first two values?

Conclusion

SQL is a declarative language where the query optimiser gets this right automatically for you: It will (almost) always filter the data first (WHERE clause), and only then transform it (JOIN, GROUP BY,
SELECT
, etc).

Just as in SQL, when we hand-write our queries using Streams (and also in imperative programming), do always:

Filter First, Map Later

Your production system will thank you.

Note, another interesting case with flatMap() is documented here.

How to Find Redundant Indexes in SQL

The following two indexes are redundant in most SQL databases:

CREATE INDEX i_actor_1 ON actor (last_name);
CREATE INDEX i_actor_2 ON actor (last_name, first_name);

It is usually safe to drop the first index, because all queries that query the LAST_NAME column only can still profit from the second index I_ACTOR_2. The reason being that LAST_NAME is the first column of the composite index I_ACTOR_2 (it would be a different story, if it weren’t the first column).

Note: It is usually safe to drop the first index, because the benefits probably outweigh the cost:

Benefits of dropping

Costs of dropping

  • Querying a composite index can be slightly slower as can be seen in the below benchmark

Let’s see the costs of dropping the index below for Oracle, PostgreSQL, and SQL Server in this particular case (beware as always when interpreting benchmarks, they heavily depend on a lot of context, especially data size!)

Oracle

Preparation:

CREATE TABLE t (
  a NUMBER(10) NOT NULL,
  b NUMBER(10) NOT NULL
);

INSERT INTO t (a, b)
SELECT level, level
FROM dual
CONNECT BY level <= 100000;

CREATE INDEX i1 ON t(a);
CREATE INDEX i2 ON t(a, b);

EXEC dbms_stats.gather_table_stats('TEST', 'T');

Benchmark:

SET SERVEROUTPUT ON
CREATE TABLE results (
  run     NUMBER(2),
  stmt    NUMBER(2),
  elapsed NUMBER
);

DECLARE
  v_ts TIMESTAMP WITH TIME ZONE;
  v_repeat CONSTANT NUMBER := 2000;
BEGIN

  -- Repeat benchmark several times to avoid warmup penalty
  FOR r IN 1..5 LOOP
    v_ts := SYSTIMESTAMP;
      
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT /*+INDEX(t i1)*/ * FROM t WHERE a = 1
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;
  
    INSERT INTO results VALUES (r, 1, 
      SYSDATE + ((SYSTIMESTAMP - v_ts) * 86400) - SYSDATE);
    v_ts := SYSTIMESTAMP;
      
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT /*+INDEX(t i2)*/ * FROM t WHERE a = 1
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;
      
    INSERT INTO results VALUES (r, 2, 
      SYSDATE + ((SYSTIMESTAMP - v_ts) * 86400) - SYSDATE);
  END LOOP;
  
  FOR rec IN (
    SELECT 
      run, stmt, 
      CAST(elapsed / MIN(elapsed) OVER() AS NUMBER(10, 5)) ratio 
    FROM results
  )
  LOOP
    dbms_output.put_line('Run ' || rec.run || 
      ', Statement ' || rec.stmt || 
      ' : ' || rec.ratio);
  END LOOP;
END;
/

DROP TABLE results;

The result being:

Run 1, Statement 1 : 1.4797
Run 1, Statement 2 : 1.45545

Run 2, Statement 1 : 1.1997
Run 2, Statement 2 : 1.01121

Run 3, Statement 1 : 1.13606
Run 3, Statement 2 : 1

Run 4, Statement 1 : 1.13455
Run 4, Statement 2 : 1.00242

Run 5, Statement 1 : 1.13303
Run 5, Statement 2 : 1.00606

Some notes on benchmarks here.

The fastest query execution in the above result yields 1, the other executions are multiples of 1. Yes, there’s a 10% difference in this case, so as you can see. The benefits (faster insertions) certainly should outweight the cost (slower queries), so, don’t apply this advice in a read-heavy / write-rarely database.

PostgreSQL

A similar difference can be seen in a PostgreSQL benchmark. No hints can be used to choose indexes, so we’re simply creating two tables:

CREATE TABLE t1 (
  a INT NOT NULL,
  b INT NOT NULL
);
CREATE TABLE t2 (
  a INT NOT NULL,
  b INT NOT NULL
);

INSERT INTO t1 (a, b)
SELECT s, s
FROM generate_series(1, 100000) AS s(s);

INSERT INTO t2 (a, b)
SELECT s, s
FROM generate_series(1, 100000) AS s(s);

CREATE INDEX i1 ON t1(a);
CREATE INDEX i2 ON t2(a, b);

ANALYZE t1;
ANALYZE t2;

Benchmark:

DO $$
DECLARE
  v_ts TIMESTAMP;
  v_repeat CONSTANT INT := 10000;
  rec RECORD;
BEGIN

  -- Repeat benchmark several times to avoid warmup penalty
  FOR r IN 1..5 LOOP
    v_ts := clock_timestamp();

    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT * FROM t1 WHERE a = 1
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;

    RAISE INFO 'Run %, Statement 1: %', r, 
      (clock_timestamp() - v_ts); 
    v_ts := clock_timestamp();

    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT * FROM t2 WHERE a = 1
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;

    RAISE INFO 'Run %, Statement 2: %', r, 
      (clock_timestamp() - v_ts); 
    RAISE INFO '';
  END LOOP;
END$$;

Result:

INFO:  Run 1, Statement 1: 00:00:00.071891
INFO:  Run 1, Statement 2: 00:00:00.080833

INFO:  Run 2, Statement 1: 00:00:00.076329
INFO:  Run 2, Statement 2: 00:00:00.079772

INFO:  Run 3, Statement 1: 00:00:00.073137
INFO:  Run 3, Statement 2: 00:00:00.079483

INFO:  Run 4, Statement 1: 00:00:00.073456
INFO:  Run 4, Statement 2: 00:00:00.081508

INFO:  Run 5, Statement 1: 00:00:00.077148
INFO:  Run 5, Statement 2: 00:00:00.083535

SQL Server

Preparation:

CREATE TABLE t (
  a INT NOT NULL,
  b INT NOT NULL
);

WITH s(s) AS (
  SELECT 1
  UNION ALL
  SELECT s + 1 FROM s WHERE s < 100
)
INSERT INTO t
SELECT TOP 100000 
  row_number() over(ORDER BY (SELECT 1)), 
  row_number() over(ORDER BY (select 1)) 
FROM s AS s1, s AS s2, s AS s3;

CREATE INDEX i1 ON t(a);
CREATE INDEX i2 ON t(a, b);

UPDATE STATISTICS t;

Benchmark:

DECLARE @ts DATETIME;
DECLARE @repeat INT = 2000;
DECLARE @r INT;
DECLARE @i INT;
DECLARE @dummy VARCHAR;

DECLARE @s1 CURSOR;
DECLARE @s2 CURSOR;

DECLARE @results TABLE (
  run     INT,
  stmt    INT,
  elapsed DECIMAL
);

SET @r = 0;
WHILE @r < 5
BEGIN
  SET @r = @r + 1

  SET @s1 = CURSOR FOR 
    SELECT b FROM t WITH (INDEX (i1)) WHERE a = 1;

  SET @s2 = CURSOR FOR 
    SELECT b FROM t WITH (INDEX (i2)) WHERE a = 1;

  SET @ts = current_timestamp;
  SET @i = 0;
  WHILE @i < @repeat
  BEGIN
    SET @i = @i + 1

    OPEN @s1;
    FETCH NEXT FROM @s1 INTO @dummy;
    WHILE @@FETCH_STATUS = 0
    BEGIN
      FETCH NEXT FROM @s1 INTO @dummy;
    END;

    CLOSE @s1;
  END;

  DEALLOCATE @s1;
  INSERT INTO @results 
    VALUES (@r, 1, DATEDIFF(ms, @ts, current_timestamp));

  SET @ts = current_timestamp;
  SET @i = 0;
  WHILE @i < @repeat
  BEGIN
    SET @i = @i + 1

    OPEN @s2;
    FETCH NEXT FROM @s2 INTO @dummy;
    WHILE @@FETCH_STATUS = 0
    BEGIN
      FETCH NEXT FROM @s2 INTO @dummy;
    END;

    CLOSE @s2;
  END;

  DEALLOCATE @s2;
  INSERT INTO @results 
    VALUES (@r, 2, DATEDIFF(ms, @ts, current_timestamp));
END;

SELECT 'Run ' + CAST(run AS VARCHAR) + 
  ', Statement ' + CAST(stmt AS VARCHAR) + ': ' + 
  CAST(CAST(elapsed / MIN(elapsed) OVER() AS DECIMAL(10, 5)) AS VARCHAR)
FROM @results;

Result:

Run 1, Statement 1: 1.22368
Run 1, Statement 1: 1.09211

Run 2, Statement 1: 1.05263
Run 2, Statement 1: 1.09211

Run 3, Statement 1: 1.00000
Run 3, Statement 1: 1.05263

Run 4, Statement 1: 1.05263
Run 4, Statement 1: 1.00000

Run 5, Statement 1: 1.09211
Run 5, Statement 1: 1.05263

As can be seen, predictably, in all databases the smaller non-composite index is slightly faster for this type of query than the composite index. In this particular benchmark, this is specifically true because the composite index acts as a covering index.

Yet both indexes can be used for the query in a reasonable way, so if disk space / insertion speed is an issue, the redundant single-column index can be dropped.

How to find such indexes

The following query will help you detect such indexes in Oracle, PostgreSQL, and SQL Server:

Oracle

WITH indexes AS (
  SELECT
    i.owner,
    i.index_name,
    i.table_name,
    listagg(c.column_name, ', ')
      WITHIN GROUP (ORDER BY c.column_position)
      AS columns
  FROM all_indexes i
  JOIN all_ind_columns c
    ON i.owner = c.index_owner
    AND i.index_name = c.index_name
  GROUP BY i.owner, i.table_name, i.index_name, i.leaf_blocks
)
SELECT
  i.owner,
  i.table_name,
  i.index_name AS "Deletion candidate index",
  i.columns AS "Deletion candidate columns",
  j.index_name AS "Existing index",
  j.columns AS "Existing columns"
FROM indexes i
JOIN indexes j
  ON i.owner = j.owner
  AND i.table_name = j.table_name
  AND j.columns LIKE i.columns || ',%'

Result:

TABLE_NAME   delete index   columns   existing index   columns
-------------------------------------------------------------------------
T            I1             A         I2               A, B

In short, it lists all the indexes whose columns are a prefix of another index’s columns

PostgreSQL

Get ready for a really nifty query. Here’s how to discover redundant indexes in PostgreSQL, which unfortunately doesn’t seem to have an easy, out-of-the-box dictionary view to discover index columns:

WITH indexes AS (
  SELECT 
    tnsp.nspname AS schema_name,
    trel.relname AS table_name,
    irel.relname AS index_name,
    string_agg(a.attname, ', ' ORDER BY c.ordinality) AS columns
  FROM pg_index AS i
  JOIN pg_class AS trel ON trel.oid = i.indrelid
  JOIN pg_namespace AS tnsp ON trel.relnamespace = tnsp.oid
  JOIN pg_class AS irel ON irel.oid = i.indexrelid
  JOIN pg_attribute AS a ON trel.oid = a.attrelid
  JOIN LATERAL unnest(i.indkey) 
    WITH ORDINALITY AS c(colnum, ordinality)
      ON a.attnum = c.colnum
  GROUP BY i, tnsp.nspname, trel.relname, irel.relname
)
SELECT
  i.table_name,
  i.index_name AS "Deletion candidate index",
  i.columns AS "Deletion candidate columns",
  j.index_name AS "Existing index",
  j.columns AS "Existing columns"
FROM indexes i
JOIN indexes j
  ON i.schema_name = j.schema_name
  AND i.table_name = j.table_name
  AND j.columns LIKE i.columns || ',%';

This is a really nice case of lateral unnesting with ordinality, which you should definitely add to your PostgreSQL tool chain.

SQL Server

Now, SQL Server doesn’t have a nice STRING_AGG function (yet), but we can work around this using STUFF and XML to get the same query.

Of course, there are other solutions using recursive SQL, but I’m too lazy to translate the simple string pattern-matching approach to something recursive.

WITH 
  i AS (
    SELECT 
	  s.name AS schema_name,
      t.name AS table_name,
      i.name AS index_name,
      c.name AS column_name,
      ic.index_column_id
    FROM sys.indexes i 
    JOIN sys.index_columns ic 
      ON i.object_id = ic.object_id 
      AND i.index_id = ic.index_id 
    JOIN sys.columns c 
      ON ic.object_id = c.object_id 
      AND ic.column_id = c.column_id 
    JOIN sys.tables t 
      ON i.object_id = t.object_id 
	JOIN sys.schemas s
	  ON t.schema_id = s.schema_id
  ),
  indexes AS (
    SELECT
	  schema_name,
      table_name,
      index_name,
      STUFF((
        SELECT ',' + j.column_name 
        FROM i j
        WHERE i.table_name = j.table_name 
        AND i.index_name = j.index_name 
        FOR XML PATH('') -- Yay, XML in SQL!
      ), 1, 1, '') columns
    FROM i
    GROUP BY schema_name, table_name, index_name
  )
SELECT
  i.schema_name,
  i.table_name,
  i.index_name AS "Deletion candidate index",
  i.columns AS "Deletion candidate columns",
  j.index_name AS "Existing index",
  j.columns AS "Existing columns"
FROM indexes i
JOIN indexes j
  ON i.schema_name = j.schema_name
  AND i.table_name = j.table_name
  AND j.columns LIKE i.columns + ',%';

A note on partial indexes

SQL Server and PostgreSQL support “partial indexes”, i.e. indexes that contain only parts of your data (and Oracle can emulate them in various ways). Such indexes might appear in the resulting list – you may want to be careful to check if they’re really redundant or not. Chances are, they’re there for a very good reason.

Conclusion

Now go run the above query on your production database and… Very carefully and reasonably think about whether you really want to drop those indexes ;)

Don’t Use the String Concatenation “Trick” in SQL Predicates

In SQL, quite often, we want to compare several values with each other. For instance, when we’re looking for a specific user by their first and last names, we’ll write a query like this one:

SELECT *
FROM customer
WHERE first_name = 'SUSAN'
AND last_name = 'WILSON';

We’re getting:

CUSTOMER_ID   FIRST_NAME   LAST_NAME
------------------------------------
          8   SUSAN        WILSON

Surely, everyone agrees that this is correct and perfectly fine as we probably have an index on these two columns (or on at least one of them) to speed up such queries:

CREATE INDEX idx_customer_name ON customer (last_name, first_name);

The execution plan is thus optimal, e.g. with Oracle:

-------------------------------------------------------------------------
| Id  | Operation                           | Name              | Rows  |
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                   |       |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| CUSTOMER          |     1 |
|*  2 |   INDEX RANGE SCAN                  | IDX_CUSTOMER_NAME |     1 |
-------------------------------------------------------------------------

But sometimes, we cannot use AND to connect two predicates. In particular, that’s not possible with an IN predicate, so people sometimes resort to using string concatenation, because that seems to work and make sense.

For instance, let’s find all customers whose first and last names matches those of an actor (as always, using the Sakila database)

SELECT *
FROM customer
WHERE first_name || last_name IN (
  SELECT first_name || last_name
  FROM actor
)

And yes indeed, what we’re getting here is the correct answer:

CUSTOMER_ID   FIRST_NAME   LAST_NAME
------------------------------------
          6   JENNIFER     DAVIS

But that answer is only accidentally correct!

Because we weren’t looking for customers called

first_name = 'JENNIFER' AND last_name = 'DAVIS'

We were looking for customers called

first_name || last_name = 'JENNIFERDAVIS'

Want proof? Let’s add a new customer:

INSERT INTO customer (customer_id, first_name, last_name )
VALUES               (600        , 'JENNI'   , 'FERDAVIS');

Yeah right? No one is called FERDAVIS. Or are they? As good programmers, we closely observe Murphy’s Law (i.e. always look both left and right when crossing a street).

In any case, let’s run our query again:

SELECT *
FROM customer
WHERE first_name || last_name IN (
  SELECT first_name || last_name
  FROM actor
)

And observe the result!

CUSTOMER_ID   FIRST_NAME   LAST_NAME
------------------------------------
          6   JENNIFER     DAVIS
        600   JENNI        FERDAVIS

Of course, because our predicate was really looking for customers called

first_name || last_name = 'JENNIFERDAVIS'

Which matches in both cases:

-- What we expected
first_name || last_name = 'JENNIFER' || 'DAVIS'

-- What we got
first_name || last_name = 'JENNI' || 'FERDAVIS'

Notice that I only added this customer to the customer table, not to the actor table. There’s no actor by the name FERDAVIS, so the result is clearly wrong.

AHA! Let’s use an “impossible” separator

So, we might proceed to fixing this as such:

SELECT *
FROM customer
WHERE first_name || '###' || last_name IN (
  SELECT first_name || '###' || last_name
  FROM actor
)

And now, the result is again correct. We get only JENNIFER DAVIS because we were looking for:

first_name || '###' || last_name = 'JENNIFER###DAVIS'

This works quite well for a while, as the separator is quite “impossible” (i.e. improbable) to be encountered in actual data. But we shouldn’t trust our judgement, because… Murphy’s Law. So you might think: better use a more rare separator, e.g. (if your database supports proper character sets)

SELECT *
FROM customer
WHERE first_name || '🙈🙉🙊' || last_name IN (
  SELECT first_name || '🙈🙉🙊' || last_name
  FROM actor
)

The use of emojis should indicate what my opinion of this approach is.

Too bad for performance, though

Remember that index we’ve created? Fact is, we also have such an index on the ACTOR table:

CREATE INDEX idx_actor_name ON actor (last_name, first_name);

And now, let’s assume our query is a bit different. We’ll be looking only for customers whose address_id is 10:

SELECT *
FROM customer
WHERE address_id = 10
AND first_name || '🙈🙉🙊' || last_name IN (
  SELECT first_name || '🙈🙉🙊' || last_name
  FROM actor
)

Now, our querymoji is using the index indeed, but for an INDEX FULL SCAN, so it’s only slightly faster than scanning the entire actor table:

-----------------------------------------------------------------------------------
| Id  | Operation                            | Name                       | Rows  |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                            |       |
|*  1 |  HASH JOIN SEMI                      |                            |     1 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| CUSTOMER                   |     1 |
|*  3 |    INDEX RANGE SCAN                  | IDX_CUSTOMER_FK_ADDRESS_ID |     1 |
|   4 |   INDEX FULL SCAN                    | IDX_ACTOR_NAME             |     2 |
-----------------------------------------------------------------------------------

And what’s worse, even if all the cardinality estimates correctly indicate only 1-2 rows, we’ll perform a HASH JOIN and load the full index for it! We should be running a NESTED LOOP instead.

Is there a better way? Yes! Use row constructors to compare several values at once:

SELECT *
FROM customer
WHERE address_id = 10
AND (first_name, last_name) IN (
  SELECT first_name, last_name
  FROM actor
);

Or, if your database doesn’t support this syntax (luckily, Oracle and PostgreSQL do, for instance), then you can resort to an equivalent EXISTS predicate

SELECT *
FROM customer c
WHERE address_id = 10
AND EXISTS (
  SELECT 1
  FROM actor a
  WHERE c.first_name = a.first_name
  AND c.last_name = a.last_name
);

Both of these queries are exactly equivalent and result in a nested loop semi join, rather than the previous hash join, which is perfectly reasonable for these small tables. We can now use the IDX_ACTOR_NAME for a quick INDEX RANGE SCAN operation:

-----------------------------------------------------------------------------------
| Id  | Operation                            | Name                       | Rows  |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                            |       |
|   1 |  NESTED LOOPS SEMI                   |                            |     1 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| CUSTOMER                   |     1 |
|*  3 |    INDEX RANGE SCAN                  | IDX_CUSTOMER_FK_ADDRESS_ID |     1 |
|*  4 |   INDEX RANGE SCAN                   | IDX_ACTOR_NAME             |     1 |
-----------------------------------------------------------------------------------

But let’s not trust the estimated plans. Let’s benchmark (more info about benchmarking SQL here)

SET SERVEROUTPUT ON
DECLARE
  v_ts TIMESTAMP WITH TIME ZONE;
  v_repeat CONSTANT NUMBER := 2500;
BEGIN

  -- Repeat benchmark several times to avoid warmup penalty
  FOR r IN 1..5 LOOP
    v_ts := SYSTIMESTAMP;
      
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT first_name, last_name
        FROM customer
        WHERE address_id = 10
        AND first_name || '###' || last_name IN (
          SELECT first_name || '###' || last_name
          FROM actor
        )
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;
      
    dbms_output.put_line('Run ' || r ||', Statement 1 : ' 
      || (SYSTIMESTAMP - v_ts));
    v_ts := SYSTIMESTAMP;
      
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT first_name, last_name
        FROM customer
        WHERE address_id = 10
        AND (first_name, last_name) IN (
          SELECT first_name, last_name
          FROM actor
        )
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;
      
    dbms_output.put_line('Run ' || r ||', Statement 2 : '
      || (SYSTIMESTAMP - v_ts));
  END LOOP;
END;
/

As can be seen here, the benchmark shows that the query using the row constructor is drastically faster as it can properly use the index as it should:

Run 1, Statement 1 : +000000000 00:00:00.374471000
Run 1, Statement 2 : +000000000 00:00:00.062830000
Run 2, Statement 1 : +000000000 00:00:00.364168000
Run 2, Statement 2 : +000000000 00:00:00.066252000
Run 3, Statement 1 : +000000000 00:00:00.359559000
Run 3, Statement 2 : +000000000 00:00:00.063898000
Run 4, Statement 1 : +000000000 00:00:00.344775000
Run 4, Statement 2 : +000000000 00:00:00.086060000
Run 5, Statement 1 : +000000000 00:00:00.394163000
Run 5, Statement 2 : +000000000 00:00:00.063176000

Now, imagine we were running this against some much more impressive data sets than the Sakila database

Conclusion

If you’re ever thinking about concatenating two fields for a comparison, try again. There are two major caveats that should indicate you’re about to do something silly:

  • There’s a major risk of your query being subtly wrong (accidental matches between JENNIFER DAVIS and JENNI FERDAVIS)
  • There’s a major risk of your query being quite slow

So, as a rule of thumb, don’t use concatenation in predicates. There’s (almost) always a better way.

Read also: Why You Should (Sometimes) Avoid Expressions in SQL Predicates

Faster SQL Through Occasionally Choosing Natural Keys Over Surrogate Keys

There are many many opinions out there regarding the old surrogate key vs. natural key debate. Most of the times, surrogate keys (e.g. sequence generated IDs) win because they’re much easier to design:

  • They’re easy to keep consistent across a schema (e.g. every table has an ID column, and that’s always the primary key)
  • They’re thus a no-brainer to add. When you create a new table, you don’t need to worry about any candidate keys
  • They’re guaranteed to be unique, because they have absolutely no business value, only a technical value

Great. So why bother using natural keys in the first place?

Well, there is a very compelling reason!

Performance!

Whenever you introduce surrogate keys, this means that your key data becomes completely meaningless. From a design perspective, that’s not too bad. You can easily join that other table to get the interesting, meaningful information that hides behind the surrogate foreign key value. For example, in our Sakila database

… we have a typical many-to-many relationship modelled with a relationship table between the FILM table and the CATEGORY table – state-of-the-art normalisation. But check out this interesting thing:

  • The FILM_CATEGORY relationship table doesn’t contain any interesting information at all. Just the relationships
  • The category table only contains a single useful column: The NAME column
  • The remaining columns (CATEGORY_ID and LAST_UPDATE) have absolutely no meaning

With this in mind, we could design a much simpler schema, where we use the category name as a natural key, and in fact, we don’t even need the CATEGORY table anymore, we can now remove it (that’s optional here. To ensure data correctness, we could keep it around, containing only the NAME column as a primary key). Check this out:

Now, if we run a query like the following one against our entire Sakila schema:

SELECT c.name, count(*)
FROM film_actor fa USING (actor_id)
JOIN film_category fc USING (film_id)
JOIN category c USING (category_id)
WHERE actor_id = 1
GROUP BY c.name
ORDER BY count(*) DESC

The query finds all categories a given actor played in, and the number of films that the given actor played in each category. For instance, this could be the result:

NAME       COUNT(*)
-------------------
Horror     3
Classics   2
Family     2
New        2
Games      2
Animation  1
Sports     1
Children   1
...

With an alternative schema where the category NAME has been moved to a new FILM_CATEGORY_NATURAL table, we could run this much simpler query:

SELECT fc.name, count(*)
FROM film_actor fa 
JOIN film_category_natural fc 
  USING (film_id)
WHERE actor_id = 1
GROUP BY fc.name
ORDER BY count(*) DESC

Notice how we can omit an entire JOIN.

The execution plans (here on Oracle) are quite different. Check this out:

Before:

After:

Unfortunately, the cost difference (8 vs 5) cannot be taken as a tool to compare actual costs between the two queries/plans. But the plans are otherwise very similar, except that we’re simply missing one table access (CATEGORY and an entire JOIN). That’s a significant improvement for something this simple. Imagine the improvement if we could roll out this kind of better query throughout the system?

We could look into more execution plan measurements (especially from the actual plan results), but what if we simply benchmark the two queries using the same silly benchmark, as always, repeating each statement 100 times:

SET SERVEROUTPUT ON
DECLARE
  v_ts TIMESTAMP;
  v_repeat CONSTANT NUMBER := 100;
BEGIN
  v_ts := SYSTIMESTAMP;
    
  FOR i IN 1..v_repeat LOOP
    FOR rec IN (
      SELECT c.name, count(*)
      FROM film_actor fa USING (actor_id)
      JOIN film_category fc USING (film_id)
      JOIN category c USING (category_id)
      WHERE actor_id = 1
      GROUP BY c.name
      ORDER BY count(*) DESC
    ) LOOP
      NULL;
    END LOOP;
  END LOOP;
    
  dbms_output.put_line('Statement 1 : ' || (SYSTIMESTAMP - v_ts));
  v_ts := SYSTIMESTAMP;
    
  FOR i IN 1..v_repeat LOOP
    FOR rec IN (
      SELECT fc.name, count(*)
      FROM film_actor fa 
      JOIN film_category_natural fc 
        USING (film_id)
      WHERE actor_id = 1
      GROUP BY fc.name
      ORDER BY count(*) DESC
    ) LOOP
      NULL;
    END LOOP;
  END LOOP;
    
  dbms_output.put_line('Statement 2 : ' || (SYSTIMESTAMP - v_ts));
END;
/

The results are drastic:

Statement 1 : 00:00:00.122070000
Statement 2 : 00:00:00.051179000

A factor of 2.5x faster!

(Disclaimer: Benchmarks are an inaccurate way of measuring things because they suffer (or benefit) from “unfair” side-effects like heavy caching, but they’re a helpful tool to assess the order of magnitude of a difference between two queries).

The JOIN is completely unnecessary

The only reason why we joined the CATEGORY table each time is because we needed to display the meaningful business value of a category to the user, the CATEGORY.NAME. We could have avoided the JOIN if that was not a requirement, but displaying surrogate keys to the user (the CATEGORY_ID) would be rather harsh, wouldn’t it?

And we’re doing this all the time with all sorts of tables. We have:

  • Category tables (category name is a good candidate key)
  • Translation tables (label is a good candidate key)
  • Country tables (ISO 3166 country codes are a good candidate key)
  • Language tables (ISO 639 language codes are a good candidate key)
  • Currency tables (ISO 4217 currency codes are a good candidate key)
  • Stock symbol tables (ISIN security codes are a good candidate key)
  • … and many more

Working with natural keys can be quite cumbersome. But in some entities, the internationally standardised codes are really good candidate keys, and most of the time, they’re sufficient. What does the LANGUAGE_ID 47 even mean? It means nothing. An experienced DBA will remember, after a while, that it means “English”. But wouldn’t EN be a much better value?

You would have EN as a primary key AND foreign key value, so chances are, because everyone (including frontend developers who probably hard-code some translations anyway) knows what language EN is (but no one knows what 47 means), you will almost never need to join the language table again – except in those rare cases where you want to work with secondary language columns, such as, for instance, DESCRIPTION.

Now, imagine what happens if we search for English entries, or as in our previous example, for films of category “Classics”? Our entire JOIN graph would be simplified.

(Another place where we don’t need additional surrogate keys is the relationship table. In this particular case, there’s no such key anyway)

Caveats

Our category strings are quite short. If natural keys become longer, then the duplication itself can become a problem on a lower storage level, as you might need more pages and blocks to store the same amount of rows.

Please, do take this advice in this article with a grain of salt. The essence here is to not always follow strict rules that were established only as a good default. There are always tradeoffs!

Conclusion

This should be quite a straightforward refactoring for many applications. If your tables are extremely obvious picks for a natural key (like the above), then do use natural keys. Your queries will immediately be faster – not necessarily much faster, but probably you can speed up a significant number of queries, i.e. take load off your entire system. Plus: your database will be more user-friendly.

And all of this at the price of not using the identical table design everywhere. I mean – when was having an identical table design a real business case anyway, right?

Side-note

In some cases, you could take this even one step further and denormalise your schema by putting categories as arrays or XML or JSON data structures directly inside your films. You’ll lose the normalisation benefits, but you could further win in terms of performance. For more details, read this very interesting article (about PostgreSQL) here:

http://www.databasesoup.com/2015/01/tag-all-things.html

Many SQL Performance Problems Stem from “Unnecessary, Mandatory Work”

Probably the most impactful thing you could learn about when writing efficient SQL is indexing. A very close runner-up, however, is the fact that a lot of SQL clients demand tons of “unnecessary, mandatory work” from the database.

Repeat this after me:

Unnecessary, Mandatory Work

What is “unnecessary, mandatory work”? It’s two things (duh):

Unnecessary

Let’s assume your client application needs this information here:

Nothing out of the ordinary. We run a movie database (e.g. the Sakila database) and we want to display the title and rating of each film to the user.

This is the query that would produce the above result:

SELECT title, rating
FROM film

However, our application (or our ORM) runs this query instead:

SELECT *
FROM film

What are we getting? Guess what. We’re getting tons of useless information:

There’s even some complex JSON all the way to the right, which is loaded:

  • From the disk
  • Into the caches
  • Over the wire
  • Into the client memory
  • And then discarded

Yes, we discard most of this information. The work that was performed to retrieve it was completely unnecessary. Right? Agreed.

Mandatory

That’s the worse part. While optimisers have become quite smart these days, this work is mandatory for the database. There’s no way the database can know that the client application actually didn’t need 95% of the data. And that’s just a simple example. Imagine if we joined more tables…

So what, you think? Databases are fast? Let me offer you some insight you may not have thought of, before:

Memory consumption

Sure, the individual execution time doesn’t really change much. Perhaps, it’ll be 1.5x slower, but we can handle that right? For the sake of convenience? Sometimes that’s true. But if you’re sacrificing performance for convenience every time, things add up. We’re no longer talking about performance (speed of individual queries), but throughput (system response time), and that’s when stuff gets really hairy and tough to fix. When you stop being able to scale.

Let’s look at execution plans, Oracle this time:

--------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes |
--------------------------------------------------
|   0 | SELECT STATEMENT  |      |  1000 |   166K|
|   1 |  TABLE ACCESS FULL| FILM |  1000 |   166K|
--------------------------------------------------

Versus

--------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes |
--------------------------------------------------
|   0 | SELECT STATEMENT  |      |  1000 | 20000 |
|   1 |  TABLE ACCESS FULL| FILM |  1000 | 20000 |
--------------------------------------------------

We’re using 8x as much memory in the database when doing SELECT * rather than SELECT film, rating. That’s not really surprising though, is it? We knew that. Yet we accepted it in many many of our queries where we simply didn’t need all that data. We generated needless, mandatory work for the database, and it does sum up. We’re using 8x too much memory (the number will differ, of course).

Now, all the other steps (disk I/O, wire transfer, client memory consumption) are also affected in the same way, but I’m skipping those. Instead, I’d like to look at…

Index usage

Most databases these days have figured out the concept of covering indexes. A covering index is not a special index per se. But it can turn into a “special index” for a given query, either “accidentally,” or by design.

Check out this query:

SELECT * 
FROM actor
WHERE last_name LIKE 'A%'

There’s no extraordinary thing to be seen in the execution plan. It’s a simple query. Index range scan, table access, done:

-------------------------------------------------------------------
| Id  | Operation                   | Name                | Rows  |
-------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                     |     8 |
|   1 |  TABLE ACCESS BY INDEX ROWID| ACTOR               |     8 |
|*  2 |   INDEX RANGE SCAN          | IDX_ACTOR_LAST_NAME |     8 |
-------------------------------------------------------------------

Is it a good plan, though? Well, if what we really needed was this, then it’s not:

Sure, we’re wasting memory etc. But check out this alternative query:

SELECT first_name, last_name 
FROM actor
WHERE last_name LIKE 'A%'

Its plan is this:

----------------------------------------------------
| Id  | Operation        | Name            | Rows  |
----------------------------------------------------
|   0 | SELECT STATEMENT |                 |     8 |
|*  1 |  INDEX RANGE SCAN| IDX_ACTOR_NAMES |     8 |
----------------------------------------------------

We could now eliminate the table access entirely, because there’s an index that covers all the needs of our query… a covering index. Does it matter? Absolutely! This approach can speed up some of your queries by an order of magnitude (or slow them down by an order of magnitude when your index stops being covering after a change).

You cannot always profit from covering indexes. Indexes come with their own cost and you shouldn’t add too many of them, but in cases like these, it’s a no-brainer. Let’s run a benchmark:

SET SERVEROUTPUT ON
DECLARE
  v_ts TIMESTAMP;
  v_repeat CONSTANT NUMBER := 100000;
BEGIN
  v_ts := SYSTIMESTAMP;
    
  FOR i IN 1..v_repeat LOOP
    FOR rec IN (
      -- Worst query: Memory overhead AND table access
      SELECT *
      FROM actor
      WHERE last_name LIKE 'A%'
    ) LOOP
      NULL;
    END LOOP;
  END LOOP;
    
  dbms_output.put_line('Statement 1 : ' || (SYSTIMESTAMP - v_ts));
  v_ts := SYSTIMESTAMP;
    
  FOR i IN 1..v_repeat LOOP
    FOR rec IN (
      -- Better query: Still table access
      SELECT /*+INDEX(actor(last_name))*/ 
        first_name, last_name
      FROM actor
      WHERE last_name LIKE 'A%'
    ) LOOP
      NULL;
    END LOOP;
  END LOOP;
    
  dbms_output.put_line('Statement 2 : ' || (SYSTIMESTAMP - v_ts));
  v_ts := SYSTIMESTAMP;
    
  FOR i IN 1..v_repeat LOOP
    FOR rec IN (
      -- Best query: Covering index
      SELECT /*+INDEX(actor(last_name, first_name))*/ 
        first_name, last_name
      FROM actor
      WHERE last_name LIKE 'A%'
    ) LOOP
      NULL;
    END LOOP;
  END LOOP;
    
  dbms_output.put_line('Statement 3 : ' || (SYSTIMESTAMP - v_ts));
END;
/

The result is:

Statement 1 : +000000000 00:00:02.479000000
Statement 2 : +000000000 00:00:02.261000000
Statement 3 : +000000000 00:00:01.857000000

Note, the actor table only has 4 columns, so the difference between statements 1 and 2 is not too impressive, but still significant. Note also I’m using Oracle’s hints to force the optimiser to pick one or the other index for the query. Statement 3 clearly wins in this case. It’s a much better query, and that’s just an extremely simple query.

Again, when we write SELECT *, we create needless, mandatory work for the database, which it cannot optimise. It won’t pick the covering index because that index has a bit more overhead than the LAST_NAME index that it did pick, and after all, it had to go to the table anyway to fetch the useless LAST_UPDATE column, for instance.

But things get worse with SELECT *. Consider…

SQL transformations

Optimisers work so well, because they transform your SQL queries (watch my recent talk at Voxxed Days Zurich about how this works). For instance, there’s a SQL transformation called “JOIN elimination”, and it is really powerful. Consider this auxiliary view, which we wrote because we grew so incredibly tired of joining all these tables all the time:

CREATE VIEW v_customer AS
SELECT 
  c.first_name, c.last_name, 
  a.address, ci.city, co.country
FROM customer c
JOIN address a USING (address_id)
JOIN city ci USING (city_id)
JOIN country co USING (country_id)

This view just connects all the “to-one” relationships between a CUSTOMER and their different ADDRESS parts. Thanks, normalisation.

Now, after a while working with this view, imagine, we’ve become so accustomed to this view, we forgot all about the underlying tables. And now, we’re running this query:

SELECT *
FROM v_customer

We’re getting quite some impressive plan:

----------------------------------------------------------------
| Id  | Operation            | Name     | Rows  | Bytes | Cost |
----------------------------------------------------------------
|   0 | SELECT STATEMENT     |          |   599 | 47920 |   14 |
|*  1 |  HASH JOIN           |          |   599 | 47920 |   14 |
|   2 |   TABLE ACCESS FULL  | COUNTRY  |   109 |  1526 |    2 |
|*  3 |   HASH JOIN          |          |   599 | 39534 |   11 |
|   4 |    TABLE ACCESS FULL | CITY     |   600 | 10800 |    3 |
|*  5 |    HASH JOIN         |          |   599 | 28752 |    8 |
|   6 |     TABLE ACCESS FULL| CUSTOMER |   599 | 11381 |    4 |
|   7 |     TABLE ACCESS FULL| ADDRESS  |   603 | 17487 |    3 |
----------------------------------------------------------------

Well, of course. We run all these joins and full table scans, because that’s what we told the database to do. Fetch all this data.

Now, again, imagine, what we really wanted on one particular screen was this:

Yeah, duh, right? By now you get my point. But imagine, we’ve learned from the previous mistakes and we’re now actually running the following, better query:

SELECT first_name, last_name
FROM v_customer

Now, check this out!

------------------------------------------------------------------
| Id  | Operation          | Name        | Rows  | Bytes | Cost  |
------------------------------------------------------------------
|   0 | SELECT STATEMENT   |             |   599 | 16173 |     4 |
|   1 |  NESTED LOOPS      |             |   599 | 16173 |     4 |
|   2 |   TABLE ACCESS FULL| CUSTOMER    |   599 | 11381 |     4 |
|*  3 |   INDEX UNIQUE SCAN| SYS_C007120 |     1 |     8 |     0 |
------------------------------------------------------------------

That’s a drastic improvement in the execution plan. Our joins were eliminated, because the optimiser could prove they were needless, so once it can prove this (and you don’t make the work mandatory by selecting *), it can remove the work and simply not do it. Why is that the case?

Each CUSTOMER.ADDRESS_ID foreign key guarantees that there is exactly one ADDRESS.ADDRESS_ID primary key value, so the JOIN operation is guaranteed to be a to-one join which does not add rows nor remove rows. If we don’t even select rows or query rows, well, we don’t need to actually load the rows at all. Removing the JOIN provably won’t change the outcome of the query.

Databases do these things all the time. You can try this on most databases:

-- Oracle
SELECT CASE WHEN EXISTS (
  SELECT 1 / 0 FROM dual
) THEN 1 ELSE 0 END
FROM dual

-- More reasonable SQL dialects, e.g. PostgreSQL
SELECT EXISTS (SELECT 1 / 0)

In this case, you might expect an arithmetic exception to be raised, as when you run this query:

SELECT 1 / 0 FROM dual

yielding

ORA-01476: divisor is equal to zero

But it doesn’t happen. The optimiser (or even the parser) can prove that any SELECT column expression in a EXISTS (SELECT ..) predicate will not change the outcome of a query, so there’s no need to evaluate it. Huh!

Meanwhile…

One of most ORM’s most unfortunate problems is the fact that they make writing SELECT * queries so easy to write. In fact, HQL / JPQL for instance, proceeded to making it the default. You can even omit the SELECT clause entirely, because after all, you’re going to be fetching the entire entity, as declared, right?

For instance:

FROM v_customer

Vlad Mihalcea for instance, a Hibernate expert and Hibernate Developer advocate recommends you use queries almost every time you’re sure you don’t want to persist any modifications after fetching. ORMs make it easy to solve the object graph persistence problem. Note: Persistence. The idea of actually modifying the object graph and persisting the modifications is inherent.

But if you don’t intend to do that, why bother fetching the entity? Why not write a query? Let’s be very clear: From a performance perspective, writing a query tailored to the exact use-case you’re solving is always going to outperform any other option. You may not care because your data set is small and it doesn’t matter. Fine. But eventually, you’ll need to scale and re-designing your applications to favour a query language over imperative entity graph traversal will be quite hard. You’ll have other things to do.

Counting for existence

Some of the worst wastes of resources is when people run COUNT(*) queries when they simply want to check for existence. E.g.

Did this user have any orders at all?

And we’ll run:

SELECT count(*)
FROM orders
WHERE user_id = :user_id

Easy. If COUNT = 0: No orders. Otherwise: Yes, orders.

The performance will not be horrible, because we probably have an index on the ORDERS.USER_ID column. But what do you think will be the performance of the above compared to this alternative here:

-- Oracle
SELECT CASE WHEN EXISTS (
  SELECT *
  FROM orders
  WHERE user_id = :user_id
) THEN 1 ELSE 0 END
FROM dual

-- Reasonable SQL dialects, like PostgreSQL
SELECT EXISTS (
  SELECT *
  FROM orders
  WHERE user_id = :user_id
)

It doesn’t take a rocket scientist to figure out that an actual existence predicate can stop looking for additional rows as soon as it found one. So, if the answer is “no orders”, then the speed will be comparable. If, however, the answer is “yes, orders”, then the answer might be drastically faster in the case where we do not calculate the exact count.

Because we don’t care about the exact count. Yet, we told the database to calculate it (needless), and the database doesn’t know we’re discarding all results bigger than 1 (mandatory).

Of course, things get much worse if you call list.size() on a JPA-backed collection to do the same…

I’ve blogged about this recently, and benchmarked the alternatives on different databases. Do check it out.

Conclusion

This article stated the “obvious”. Don’t tell the database to perform needless, mandatory work.

It’s needless because given your requirements, you knew that some specific piece of work did not need to be done. Yet, you tell the database to do it.

It’s mandatory because the database has no way to prove it’s needless. This information is contained only in the client, which is inaccessible to the server. So, the database has to do it.

This article talked about SELECT *, mostly, because that’s such an easy target. But this isn’t about databases only. This is about any distributed algorithm where a client instructs a server to perform needless, mandatory work. How many N+1 problems does your average AngularJS application have, where the UI loops over service result A, calling service B many times, instead of batching all calls to B into a single call? It’s a recurrent pattern.

The solution is always the same. The more information you give to the entity executing your command, the faster it can (in principle) execute such command. Write a better query. Every time. Your entire system will thank you for it.

If you liked this article…

… do also check out my recent talk at Voxxed Days Zurich, where I show some hyperbolic examples of why SQL will beat Java at data processing algorithms every time:

jOOQ Tuesdays: Brett Wooldridge Shows What it Takes to Write the Fastest Java Connection Pool

Welcome to the jOOQ Tuesdays series. In this series, we’ll publish an article on the third Tuesday every other month where we interview someone we find exciting in our industry from a jOOQ perspective. This includes people who work with SQL, Java, Open Source, and a variety of other related topics.

brett-wooldridge

I’m very excited to feature today Brett Wooldridge, creator of HikariCP, the fastest connection pool available for Java.

Brett, you’ve created one of the most popular connection pools for Java: HikariCP. What made your library so popular?

I’ll provide some backstory on HikariCP before I answer that, but I’ll tease the answer by saying “marketing“.

A few years ago I was creating a product prototype for the company I work for, and I needed a connection pool. Like most developers I just wanted to drop in a pool and move on, so I took to the web to find the most popular and actively maintained library. Unfortunately, while load testing the prototype we started encountering deadlocks, and exceptions indicating connection state bleed over between threads.

Because the pool was open source, I thought I’d just pull down the code, find and fix the problems, and contribute back. But when I opened the code, I found thousands of lines more code than I was expecting.  Added to the mix were many locks, nested, sometimes acquired in one method and released in some distant place. There was simply no way to reason about where potential deadlocks lurked, even if we found and fixed the ones we encountered.

I picked up another pool and inspected its code. The lock semantics were clearer, but the volume of code was still more than 2x what I expected, especially given that it was delegating the core pooling logic to a separate library.

In addition, all of the pools I studied violated JDBC contracts in multiple ways. In as much as it is possible, a pool should return a Connection that is indistinguishable from one received in the absence of the pool. But these pools didn’t automatically close Statements when a connection was “closed” (returned), or clear warnings, or rollback uncommitted transactions, and they didn’t reset properties altered by the user such as auto-commit or transaction isolation level, and more; resulting in the next consumer getting a “dirty” connection.

I thought, “Really? This is the state of connection pools in the ecosystem after 20 years of Java?” Out of necessity and frustration, I created HikariCP.

To be fair, since I started HikariCP some pools have made some of these “correctness” behaviors configurable, but none of them do so by default and I suspect most users are running with the safety off.  At least two popular pools fail to complete our benchmark with OutOfMemory exceptions when they are enabled.  Conversely, HikariCP doesn’t support an unsafe mode of operation.

Returning to your question, as noted above there were many established pools available, so how did HikariCP become popular?  “Correctness” and reliability are a tough sell, so I focused on promoting performance, and started with a simple tweet. One follower led to another.  Some users tweeted about big performance gains, and improved reliability, and at some point in 2015 the Wix engineering team wrote a blog about switching to HikariCP.

In essence, simple word of mouth has led to HikariCP’s rising popularity, with an initial “marketing” push based on performance.  I do hope that over time more users will give equal weight to correctness and reliability, without which performance is meaningless, and for my part I plan to write more about those aspects of HikariCP.  

You quoted Edsger Dijkstra: “Simplicity is prerequisite for reliability.” – That reminds me of Antoine de Saint-Exupery’s “Perfection is Achieved Not When There Is Nothing More to Add, But When There Is Nothing Left to Take Away”. How do you manage to keep things simple when this world only ever gets more complicated?

Resisting complexity through feature-creep can be challenging.  I get a lot of requests for this or that feature, and while each may be simple in and of itself, if taken in totality would significantly increase complexity and code size.  Of course, that is not to say that I don’t add features.

For example, initial versions of HikariCP only supported a fixed size pool.  HikariCP was designed for systems with fairly constant load, and in that environment pools tend to stay at their maximum size, so I saw little need to complicate the code to support dynamic sizing.  Can you imagine a server at Google falling idle for several minutes?  Additionally, I feel like the more axes of configuration there are, the more difficult it is for users to optimally configure a pool.  However, eventually there were enough users who needed dynamic sizing, and its absence was a barrier to adoption, so support was added.  Principally, I did not want lack of dynamic sizing support to deprive users of the reliability and correctness benefits of HikariCP.

Still, I probably reject the vast majority of feature requests. As the custodian of HikariCP keeping it simple and true to that core philosophy is in the best interest of the community.  I always try to minimize the “surface area”, both in terms of code and configuration.  The larger the surface area of an API, the more difficult it is to comprehend.  Our brains have a limit for the amount of contextual information that can be held “in view” at one time; this is true in a lot of contexts.  For example, when reading code, methods larger than a certain size, or conditionals of more than a certain number of terms, are difficult to follow or reason about.  Generally, for users of HikariCP, the “surface area” is manifest in the number of configuration parameters.  While I can hardly say that “Perfection [has been] achieved”, I do feel like there is not much left to take away without cutting into functionality.

Few libraries go to the byte code level to optimise their code. While this helps in benchmarks, did it also help your users in production? What were the biggest caveats you found while micro-optimising?

Definitely.  Maybe some developers are dismissive of the potential gains, because in their minds they think, “What does it matter if connection acquisition takes 100ns or 100μs, the query is going to take 10ms anyway?”  However, pools intercept dozens of methods, and the “close()” path is typically slower than acquisition, so it’s not that simple.  I often get reports from users providing confirmation of real world performance improvements.  It’s anecdotal but one user initially commented in a bug report, “We’re testing HikariCP at the client and have had great initial success – an application loading 1 million records over multiple HTTP threads and putting them in the DB had its run time cut by 70% after moving from Tomcat CP to HikariCP!”  The follow-up comment on the bug was, “This was a bug in our side, using some unrelated non-threadsafe code.  No issue.  After fixing the bug, the code runs about 2x faster using HikariCP than Tomcat CP.”  That’s pretty good; and yet some reports surprise even me.

Regarding optimisation, and as long as we’re quoting famous thinkers, I would be remiss if I didn’t cite Knuth: “We should forget about small efficiencies, say about 97% of the time: premature optimisation is the root of all evil.”  I think the key word here is “premature”.  It is definitely better to write the code as it naturally comes and then, based on detailed profiling and benchmarking, perform “peephole optimisations” (to hijack a word from compiler theory).  At the same time, I would estimate that half of the performance gains in HikariCP have come as the result of algorithmic changes, rather than low-level optimisations.

Regarding caveats to micro-optimising, it would be hard to convey how much I have learned, and am still learning.  I’d like to give a shout-out to Aleksey Shipilëv for his excellent JMH micro-benchmark framework.  Aleksey has become somewhat of a JVM performance oracle (no pun intended, he used to work for Oracle).  The JVM performs an amazing array of optimisations, and if one is not careful then what appears to be a clever optimisation in the code simply confuses the JIT’s pattern-based optimiser and the result is slower rather than faster.

In order to effectively optimise on the JVM you sometimes end up reading the JIT source code, and you must become familiar with concepts such as dead code elimination, loop invariant hoisting, constant propagation, virtual call inlining, and many more.  Even with a good grip on these concepts I am sometimes surprised by the JVM in my attempts at optimisation.  In addition to the JIT, you really must understand the Java Memory Model (JMM) and how it maps onto CPU architectures like x86.

Lastly, after the design of algorithms, contention for shared state is the source of most bottlenecks (see the aforementioned JMM), so recently the biggest gains (for example, in v2.6.0) have come from tricks that simply avoid it; the fastest code is code that is never executed.

If there is a main takeaway, it is “trust the benchmarks”, your assumptions and intuitions are wrong more often than you imagine.

Your fellow jOOQ Tuesdays interviewee Vlad Mihalcea talked to us about queueing theory. How does this compare to what you wrote about connection pool sizing?

I have great respect for Vlad, I think we’re both members of the Mutual Admiration Society.  His FlexyPool is trying to solve a difficult problem; that being how to automatically tune optimal pool settings for varying loads.  Ultimately, the upper-bound is constrained by the database’s optimal concurrent query capacity, which is where my write-up on pool sizing comes into play.  However, there is a large amount of configuration space in-between a minimally sized pool and that upper-bound, which is where FlexyPool is trying to add value, by ensuring that the pool is “right sized”, dynamically, for the load it is servicing.

I say it is a difficult problem, because connection pools on modern multi-core servers likely present as a M/G/k queue in queueing theory; arrivals have a Markovian distribution, service times have a General distribution, and there are k servers (where “server” is defined as an abstract single-threaded processor).  Quoting wikipedia, “Most performance metrics for this queueing system are not known and remain an open problem.”  Modeling connection pools as a M/M/c queue might provide a decent approximation for the purposes of predicting queue lengths, but service times are not likely to have a Markovian distribution.  Of course, there are also non-Markovian stochastic models in queueing theory that could be applied.  Complicating everything is the fact that queued waiters (threads) can abandon the queue before service, for example when a timeout is reached.  That adds an additional twist when trying to predict queue lengths and wait times.  Hats off to Vlad for taking on this problem!

Anyway, what I wrote about setting the upper-bound on pool sizing translates to pinning the k (or c) value in those respective Markovian queueing theory models.

You chose a Japanese word in your product: 光 (Hikari, “Light”). What’s your connection to Japan?

I’ve lived and worked in Tokyo since 2008, though I think my Japanese is far behind where it should be given my time here.  I chalk that up to preferring time at the keyboard to language study.

As you mentioned, Hikari (pronounced Hi-ka-lee) translates to “Light” (as in sunlight).  In English, it is a double entendre in the context of HikariCP; though in Japanese it would not be.  “Light” in the sense of “the speed of…”, and “light” in the sense of being light in terms of code weight.