What’s a “String” in the jOOQ API?

One of jOOQ’s biggest strength is the fact that it is a type safe SQL API. “Type safe”, in this context, means that every object that you put in a jOOQ query has a well defined type, such as:

These can be used in jOOQ in a type safe way as such:

ctx.select(T.A)      // A Field
   .from(T)          // A Table
   .where(T.B.eq(1)) // A Condition
   .fetch();

No Strings here, no string concatenation.

However, there are a few cases where you want to bypass type safety, for whatever reason, including to extend jOOQ e.g. with plain SQL templating. In those cases, you would pass a “String” object to the jOOQ API. But not every such String object is the same. What different types of strings are there? There are 4 main types of String in the jOOQ API:

1. The bind value

The most obvious type of string is the bind value or literal. You can create those explicitly using:

// As always, this static import is implied
import static org.jooq.impl.DSL.*;

Field<String> bind = val("abc");
Field<String> literal = inline("xyz");

By default, the first value will produce a bind parameter marker "?" in the generated SQL, whereas the second one will produce an escaped string literal 'xyz'. You’re probably already doing this implicitly. Whenever you pass a String value where the jOOQ API expects a T type, you’re implicitly wrapping your String value using DSL.val(T):

ctx.select(T.A)
   .from(T)
   .where(T.C.eq("xyz")) // Implicit bind value
   .fetch();

This is still type safe usage of a String value, as it is really wrapped as a Field<String>

2. The plain SQL template

Whenever jOOQ is missing some vendor specific functionality, the backdoor is to use plain SQL templating. You can create plain SQL templates explicitly for the most popular types as shown above, like this:

Field<Integer> field = field("(1 + 2)", SQLDataType.INTEGER);
Table<?> table       = table("generate_series(1, 10)");
Condition condition  = condition("some_function() = 1");

These expressions can now be embedded in your queries, like any other:

ctx.select(field)
   .from(table)
   .where(condition)
   .fetch();

Alternatively, there also exists convenience overloads on some query methods to make this simpler:

ctx.select(field("(1 + 2)", SQLDataType.INTEGER)) // Not on SELECT
   .from("generate_series(1, 10)")
   .where("some_function() = 1")
   .fetch();

Notice that the select() method doesn’t have such convenience API yet, as of jOOQ 3.13

Important disclaimer: Using these APIs, you’re going to expose yourself to the usual SQL injection risk that is present also with JDBC or JPQL usage, when composing SQL from strings. NEVER concatenate plain SQL templates, nor use user input in these strings. Use the templating language instead, and turn every user input into bind variables. Examples:

ctx.select(...)
   .from(...)
   .where("some_function() = ?", 1) // Bind variable
   .fetch();

ctx.select(...)
   .from(...)
   .where("some_function() = {0}", val(1)) // Templating
   .fetch();

If you encounter String types on most query API in jOOQ, it’s for plain SQL templating. All of this API is annotated with @org.jooq.PlainSQL for additional documentation purposes, and to be preprocessed with a static checker that can be used to disallow such API usage by default, for added security.

3. Names (identifiers)

But on some query API in jOOQ, Strings are not convenience for plain SQL templating, but for names and identifiers. Namely, all DDL statements use strings this way in their API.

You can create qualified or unqualified identifiers explicitly as follows:

// Unqualified table identifier
Name table = name("t");

// Qualified column identifier
Name field = name("t", "col");

And then, use those identifiers in your DDL statements, e.g. to create the table:

ctx.createTable(table)
   .column(field, SQLDataType.INTEGER)
   .execute();

Depending on the context, qualification is necessary or not. In this case, field qualification would not have been necessary.

For convenience, you can also just use String types on the createTable(String) API:

ctx.createTable("t")
   .column("col", SQLDataType.INTEGER)
   .execute();

These strings will simply be wrapped in DSL.name(String) as illustrated before.

Beware: In jOOQ, all identifiers are quoted by default (RenderQuotedNames.EXPLICIT_DEFAULT_QUOTED). This has two benefits:

  • Special characters, and more importantly, keyword conflicts are handled correctly out of the box
  • Quoting prevents SQL injection
  • Case sensitivity is handled correctly in those dialects that support it for quoted identifiers

The price to pay for this convenience is that quoted identifiers might turn case sensitive, when this is undesired. To work around this, you can turn off quoting using your settings, e.g. by setting RenderQuotedNames.EXPLICIT_DEFAULT_UNQUOTED. But beware, this will put you again at SQL injection risk if you don’t sanitise your identifier names first!

4. Keywords

Keywords are strings too, in jOOQ. In rare cases, you may want to wrap your keyword-as-string representation in a org.jooq.Keyword type. The main benefit (as of jOOQ 3.13) is a consistent keyword style. There is no convenience API for this, as it is very rare for client code to use this feature. There’s only DSL.keyword(String):

Keyword current = keyword("current");
Keyword current = keyword("time");

You can now use keywords in plain SQL templates:

Field<Time> currentTime = field(
  "{0} {1}", 
  SQLDataType.TIME, 
  current, time
);

Benchmarking JDK String.replace() vs Apache Commons StringUtils.replace()

What’s better? Using the JDK’s String.replace() or something like Apache Commons Lang’s Apache Commons Lang’s StringUtils.replace()?

In this article, I’ll compare the two, first in a profiling session using Java Mission Control (JMC), then in a benchmark using JMH, and we’ll see that Java 9 heavily improved things in this area.

Profiling using JMC

In a recent profiling session where I checked for any “obvious” bottlenecks in jOOQ, I’ve discovered this nasty regular expression pattern instantiation:

Tons of int[] instances were allocated by a regular expression pattern. That’s weird, because in general, inside of jOOQ’s internals, special care is always taken to pre-compile any regular expressions that are needed in static members, e.g.:

private static final Pattern TYPE_NAME_PATTERN = 
  Pattern.compile("\\([^\\)]*\\)");

This allows for using the Pattern in a far more optimal way, than e.g. by using String.replaceAll():

// Much better, pattern is pre-compiled
TYPE_NAME_PATTERN.matcher(castTypeName).replaceAll("")

// Much worse, pattern is compiled *every time*
castTypeName.replaceAll("\\([^\\)]*\\)", "")

That should be clear to everyone. The price to pay for this is the fact that the pattern is stored “far away” in some static member, rather than being visible right where it is used, which is a bit less readable. At least in my opinion.

SIDENOTE: People tend to get all angry about premature optimisation and such. Yes, these optimisations are micro optimisations and aren’t always worth the trouble. But this article is about jOOQ, a library that does a lot of expression tree transformations, and it is important for jOOQ to eliminate even 1% “bottlenecks”, as they make a difference. So, please read this article in this context.

Consider also our previous post about this subject: Top 10 Easy Performance Optimisations in Java

What was the problem in jOOQ?

Now, what appears to be obvious when using regular expressions seems less obvious when using ordinary, constant string replacements, such as when calling String.replace(CharSequence), as was done in the linked jOOQ issue #6672. The relevant piece of code was escaping all inline strings that are sent to the SQL database, to prevent syntax errors and, of course, SQL injection:

static final String escape(Object val, Context<?> context) {
    String result = val.toString();

    if (needsBackslashEscaping(context.configuration()))
        result = result.replace("\\", "\\\\");

    return result.replace("'", "''");
}

We’re always escaping apostrophes by doubling them, and in some databases (e.g. MySQL), we often have to escape backslashes as well (unfortunately, not all ORMs seem to do this or even be aware of this MySQL “feature”).

Unfortunately as well, despite heavy use of Apache Commons Lang’s StringUtils.replace() in jOOQ’s internals, every now and then a String.replace(CharSequence) sneaks in, because it’s just so convenient to write.

Meh, does it matter?

Usually, in ordinary business logic, it shouldn’t (again – don’t optimise prematurely), but in jOOQ, which is essentially a SQL string manipulation library, it can get quite costly if a single replace call is done excessively (for good reasons, of course), and it is slower than it should be. And it is, prior to Java 9, when this method was optimised. I’ve done the profiling with Java 8, where internally, String.replace() uses a literal regex pattern (i.e. a pattern with a “literal” flag that is faster, but it is a pattern, nonetheless).

Not only does the method appear as a major offender in the GC allocation view, it also triggers quite some action in the “hot methods” view of JMC:

Those are quite a few Pattern methods. The percentages have to be understood in the context of a benchmark, running millions of queries against an H2 in-memory database, so the overhead is significant!

Using Apache Commons Lang’s StringUtils

A simple fix is to use Apache Commons Lang’s StringUtils instead:

static final String escape(Object val, Context<?> context) {
    String result = val.toString();

    if (needsBackslashEscaping(context.configuration()))
        result = StringUtils.replace(result, "\\", "\\\\");

    return StringUtils.replace(result, "'", "''");
}

Now, the pressure has changed significantly. The int[] allocation is barely noticeable in comparison:

And much fewer Pattern calls are made, overall.

Benchmarking using JMH

Profiling can be very useful to spot bottlenecks, but it needs to be read with care. It introduces some artefacts and slight overheads and it is not 100% accurate when sampling call stacks, which might lead the wrong conclusions at times. This is why it is sometimes important to back claims by running an actual benchmark. And when benchmarking, please, don’t just loop 1 million times in a main() method. That will be very very inaccurate, except for very obvious, order-of-magnitude scale differences.

I’m using JMH here, running the following simple benchmark:

package org.jooq.test.benchmark;

import org.apache.commons.lang3.StringUtils;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Warmup;
import org.openjdk.jmh.infra.Blackhole;

@Fork(value = 3, jvmArgsAppend = "-Djmh.stack.lines=3")
@Warmup(iterations = 5)
@Measurement(iterations = 7)
public class StringReplaceBenchmark {

    private static final String SHORT_STRING_NO_MATCH = "abc";
    private static final String SHORT_STRING_ONE_MATCH = "a'bc";
    private static final String SHORT_STRING_SEVERAL_MATCHES = "'a'b'c'";
    private static final String LONG_STRING_NO_MATCH = 
      "abcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabc";
    private static final String LONG_STRING_ONE_MATCH = 
      "abcabcabcabcabcabcabcabcabcabcabca'bcabcabcabcabcabcabcabcabcabcabcabcabc";
    private static final String LONG_STRING_SEVERAL_MATCHES = 
      "abcabca'bcabcabcabcabcabc'abcabcabca'bcabcabcabcabcabca'bcabcabcabcabcabcabc";

    @Benchmark
    public void testStringReplaceShortStringNoMatch(Blackhole blackhole) {
        blackhole.consume(SHORT_STRING_NO_MATCH.replace("'", "''"));
    }

    @Benchmark
    public void testStringReplaceLongStringNoMatch(Blackhole blackhole) {
        blackhole.consume(LONG_STRING_NO_MATCH.replace("'", "''"));
    }

    @Benchmark
    public void testStringReplaceShortStringOneMatch(Blackhole blackhole) {
        blackhole.consume(SHORT_STRING_ONE_MATCH.replace("'", "''"));
    }

    @Benchmark
    public void testStringReplaceLongStringOneMatch(Blackhole blackhole) {
        blackhole.consume(LONG_STRING_ONE_MATCH.replace("'", "''"));
    }

    @Benchmark
    public void testStringReplaceShortStringSeveralMatches(Blackhole blackhole) {
        blackhole.consume(SHORT_STRING_SEVERAL_MATCHES.replace("'", "''"));
    }

    @Benchmark
    public void testStringReplaceLongStringSeveralMatches(Blackhole blackhole) {
        blackhole.consume(LONG_STRING_SEVERAL_MATCHES.replace("'", "''"));
    }

    @Benchmark
    public void testStringUtilsReplaceShortStringNoMatch(Blackhole blackhole) {
        blackhole.consume(StringUtils.replace(SHORT_STRING_NO_MATCH, "'", "''"));
    }

    @Benchmark
    public void testStringUtilsReplaceLongStringNoMatch(Blackhole blackhole) {
        blackhole.consume(StringUtils.replace(LONG_STRING_NO_MATCH, "'", "''"));
    }

    @Benchmark
    public void testStringUtilsReplaceShortStringOneMatch(Blackhole blackhole) {
        blackhole.consume(StringUtils.replace(SHORT_STRING_ONE_MATCH, "'", "''"));
    }

    @Benchmark
    public void testStringUtilsReplaceLongStringOneMatch(Blackhole blackhole) {
        blackhole.consume(StringUtils.replace(LONG_STRING_ONE_MATCH, "'", "''"));
    }

    @Benchmark
    public void testStringUtilsReplaceShortStringSeveralMatches(Blackhole blackhole) {
        blackhole.consume(StringUtils.replace(SHORT_STRING_SEVERAL_MATCHES, "'", "''"));
    }

    @Benchmark
    public void testStringUtilsReplaceLongStringSeveralMatches(Blackhole blackhole) {
        blackhole.consume(StringUtils.replace(LONG_STRING_SEVERAL_MATCHES, "'", "''"));
    }
}

Notice that I tried to run 2 x 3 different string replacement scenarios:

  • The string is “short”
  • The string is “long”

Cross joining (there, finally some SQL in this post!) the above with:

  • No match is found
  • One match is found
  • Several matches are found

That’s important because different optimisations can be implemented for those different cases, and probably, in jOOQ’s case, there is mostly no match in this particular case.

I ran this benchmark once on Java 8:

$ java -version
java version "1.8.0_141"
Java(TM) SE Runtime Environment (build 1.8.0_141-b15)
Java HotSpot(TM) 64-Bit Server VM (build 25.141-b15, mixed mode)

And on Java 9:

$ java -version
java version "9"
Java(TM) SE Runtime Environment (build 9+181)
Java HotSpot(TM) 64-Bit Server VM (build 9+181, mixed mode)

As Tagir Valeev was kind enough to remind me that this issue was supposed to be fixed in Java 9:

The results are:

Java 8

testStringReplaceLongStringNoMatch               thrpt   21    4809343.940 ▒  66443.628  ops/s
testStringUtilsReplaceLongStringNoMatch          thrpt   21   25063493.793 ▒ 660657.256  ops/s

testStringReplaceLongStringOneMatch              thrpt   21    1406989.855 ▒  43051.008  ops/s
testStringUtilsReplaceLongStringOneMatch         thrpt   21    6961669.111 ▒ 141504.827  ops/s

testStringReplaceLongStringSeveralMatches        thrpt   21    1103323.491 ▒  17047.449  ops/s
testStringUtilsReplaceLongStringSeveralMatches   thrpt   21    3899108.777 ▒  41854.636  ops/s

testStringReplaceShortStringNoMatch              thrpt   21    5936992.874 ▒  68115.030  ops/s
testStringUtilsReplaceShortStringNoMatch         thrpt   21  171660973.829 ▒ 377711.864  ops/s

testStringReplaceShortStringOneMatch             thrpt   21    3267435.957 ▒ 240198.763  ops/s
testStringUtilsReplaceShortStringOneMatch        thrpt   21    9943846.428 ▒ 270821.641  ops/s

testStringReplaceShortStringSeveralMatches       thrpt   21    2313713.015 ▒  28806.738  ops/s
testStringUtilsReplaceShortStringSeveralMatches  thrpt   21    5447065.933 ▒ 139525.472  ops/s

As can be seen, the difference is “catastrophic”. Apache Commons Lang’s StringUtils drastically outpeforms the JDK’s String.replace() in every discipline, especially when no match is found in a short string! That’s because the library optimises for this particular case:

...
int end = searchText.indexOf(searchString, start);
if (end == INDEX_NOT_FOUND) {
    return text;
}

Java 9

Things look a bit differently for Java 9:

testStringReplaceLongStringNoMatch               thrpt   21   55528132.674 ▒  479721.812  ops/s
testStringUtilsReplaceLongStringNoMatch          thrpt   21   55767541.806 ▒  754862.755  ops/s

testStringReplaceLongStringOneMatch              thrpt   21    4806322.839 ▒  217538.714  ops/s
testStringUtilsReplaceLongStringOneMatch         thrpt   21    8366539.616 ▒  142757.888  ops/s

testStringReplaceLongStringSeveralMatches        thrpt   21    2685134.029 ▒   78108.171  ops/s
testStringUtilsReplaceLongStringSeveralMatches   thrpt   21    3923819.576 ▒  351103.020  ops/s

testStringReplaceShortStringNoMatch              thrpt   21  122398496.629 ▒ 1350086.256  ops/s
testStringUtilsReplaceShortStringNoMatch         thrpt   21  121139633.453 ▒ 2756892.669  ops/s

testStringReplaceShortStringOneMatch             thrpt   21   18070522.151 ▒  498663.835  ops/s
testStringUtilsReplaceShortStringOneMatch        thrpt   21   11367395.622 ▒  153377.552  ops/s

testStringReplaceShortStringSeveralMatches       thrpt   21    7548407.681 ▒  168950.209  ops/s
testStringUtilsReplaceShortStringSeveralMatches  thrpt   21    5045065.948 ▒  175251.545  ops/s

Java 9’s implementation is now similar to that of Apache Commons, with the same optimisation for non-matches:

public String replace(CharSequence target, CharSequence replacement) {
    String tgtStr = target.toString();
    String replStr = replacement.toString();
    int j = indexOf(tgtStr);
    if (j < 0) {
        return this;
    }
    ...

It is still quite slower for matches in long strings, but faster for matches in short strings. The tradeoff for jOOQ will be to still prefer Apache Commons because:

  • Most people are still on Java 8 or less, currently
  • Most replacements won’t match and both implementations fare equally well for that in Java 9, but Apache Commons is much faster for this category in Java 8
  • If there’s a match and thus a replacement, the speed depends on the string length, where the faster implementation is currently undecided

Conclusion

This micro optimisation stuff matters in jOOQ because jOOQ is a library that does a lot of SQL string manipulation. Every allocation and every CPU cycle that is wasted when manipulating SQL strings slows down the library, and thus impacts all of its users. In a situation like this, it is definitely worth considering not using these useful JDK String methods, and opting for the much faster Apache Commons implementations instead.

Things have improved a lot in Java 9, in case of which this can mostly be ignored. But if you still need to support Java 8 (we still support Java 6 in our commercial distributions!), then this has to be considered.