What’s better? Using the JDK’s
String.replace()
or something like Apache Commons Lang’s
Apache Commons Lang’s StringUtils.replace()
?
In this article, I’ll compare the two, first in a profiling session using
Java Mission Control (JMC), then in a benchmark using
JMH, and we’ll see that Java 9 heavily improved things in this area.
Profiling using JMC
In a recent profiling session where I checked for any “obvious” bottlenecks in jOOQ,
I’ve discovered this nasty regular expression pattern instantiation:

Tons of
int[]
instances were allocated by a regular expression pattern. That’s weird, because in general, inside of jOOQ’s internals, special care is always taken to pre-compile any regular expressions that are needed in static members, e.g.:
private static final Pattern TYPE_NAME_PATTERN =
Pattern.compile("\\([^\\)]*\\)");
This allows for using the
Pattern
in a far more optimal way, than e.g. by using
String.replaceAll():
// Much better, pattern is pre-compiled
TYPE_NAME_PATTERN.matcher(castTypeName).replaceAll("")
// Much worse, pattern is compiled *every time*
castTypeName.replaceAll("\\([^\\)]*\\)", "")
That should be clear to everyone. The price to pay for this is the fact that the pattern is stored “far away” in some static member, rather than being visible right where it is used, which is a bit less readable. At least in my opinion.
SIDENOTE: People tend to get all angry about premature optimisation and such. Yes, these optimisations are micro optimisations and aren’t always worth the trouble. But this article is about jOOQ, a library that does a lot of expression tree transformations, and it is important for jOOQ to eliminate even 1% “bottlenecks”, as they make a difference. So, please read this article in this context.
Consider also our previous post about this subject: Top 10 Easy Performance Optimisations in Java
What was the problem in jOOQ?
Now, what appears to be obvious when using regular expressions seems less obvious when using ordinary, constant string replacements, such as when calling
String.replace(CharSequence)
, as was done in
the linked jOOQ issue #6672. The relevant piece of code was escaping all inline strings that are sent to the SQL database, to prevent syntax errors and, of course,
SQL injection:
static final String escape(Object val, Context<?> context) {
String result = val.toString();
if (needsBackslashEscaping(context.configuration()))
result = result.replace("\\", "\\\\");
return result.replace("'", "''");
}
We’re always escaping apostrophes by doubling them, and in some databases (e.g. MySQL), we often have to escape backslashes as well (unfortunately, not all ORMs seem to do this or even
be aware of this MySQL “feature”).
Unfortunately as well, despite heavy use of
Apache Commons Lang’s StringUtils.replace()
in jOOQ’s internals, every now and then a
String.replace(CharSequence)
sneaks in, because it’s just so convenient to write.
Meh, does it matter?
Usually, in ordinary business logic, it shouldn’t (again – don’t optimise prematurely), but in jOOQ, which is essentially a SQL string manipulation library, it can get quite costly if a single replace call is done excessively (for good reasons, of course), and it is slower than it should be. And it is, prior to Java 9, when this method was optimised. I’ve done the profiling with Java 8, where internally,
String.replace()
uses a literal regex pattern (i.e. a pattern with a “literal” flag that is faster, but it is a pattern, nonetheless).
Not only does the method appear as a major offender in the GC allocation view, it also triggers quite some action in the “hot methods” view of JMC:

Those are quite a few Pattern methods. The percentages have to be understood in the context of a benchmark, running millions of queries against an H2 in-memory database, so the overhead is significant!
Using Apache Commons Lang’s StringUtils
A simple fix is to use Apache Commons Lang’s
StringUtils
instead:
static final String escape(Object val, Context<?> context) {
String result = val.toString();
if (needsBackslashEscaping(context.configuration()))
result = StringUtils.replace(result, "\\", "\\\\");
return StringUtils.replace(result, "'", "''");
}
Now, the pressure has changed significantly. The
int[]
allocation is barely noticeable in comparison:

And much fewer Pattern calls are made, overall.
Benchmarking using JMH
Profiling can be very useful to spot bottlenecks, but it needs to be read with care. It introduces some artefacts and slight overheads and it is not 100% accurate when sampling call stacks, which might lead the wrong conclusions at times. This is why it is sometimes important to back claims by running an actual benchmark. And when benchmarking, please, don’t just loop 1 million times in a
main()
method. That will be very very inaccurate, except for very obvious, order-of-magnitude scale differences.
I’m using JMH here, running the following simple benchmark:
package org.jooq.test.benchmark;
import org.apache.commons.lang3.StringUtils;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Warmup;
import org.openjdk.jmh.infra.Blackhole;
@Fork(value = 3, jvmArgsAppend = "-Djmh.stack.lines=3")
@Warmup(iterations = 5)
@Measurement(iterations = 7)
public class StringReplaceBenchmark {
private static final String SHORT_STRING_NO_MATCH = "abc";
private static final String SHORT_STRING_ONE_MATCH = "a'bc";
private static final String SHORT_STRING_SEVERAL_MATCHES = "'a'b'c'";
private static final String LONG_STRING_NO_MATCH =
"abcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabc";
private static final String LONG_STRING_ONE_MATCH =
"abcabcabcabcabcabcabcabcabcabcabca'bcabcabcabcabcabcabcabcabcabcabcabcabc";
private static final String LONG_STRING_SEVERAL_MATCHES =
"abcabca'bcabcabcabcabcabc'abcabcabca'bcabcabcabcabcabca'bcabcabcabcabcabcabc";
@Benchmark
public void testStringReplaceShortStringNoMatch(Blackhole blackhole) {
blackhole.consume(SHORT_STRING_NO_MATCH.replace("'", "''"));
}
@Benchmark
public void testStringReplaceLongStringNoMatch(Blackhole blackhole) {
blackhole.consume(LONG_STRING_NO_MATCH.replace("'", "''"));
}
@Benchmark
public void testStringReplaceShortStringOneMatch(Blackhole blackhole) {
blackhole.consume(SHORT_STRING_ONE_MATCH.replace("'", "''"));
}
@Benchmark
public void testStringReplaceLongStringOneMatch(Blackhole blackhole) {
blackhole.consume(LONG_STRING_ONE_MATCH.replace("'", "''"));
}
@Benchmark
public void testStringReplaceShortStringSeveralMatches(Blackhole blackhole) {
blackhole.consume(SHORT_STRING_SEVERAL_MATCHES.replace("'", "''"));
}
@Benchmark
public void testStringReplaceLongStringSeveralMatches(Blackhole blackhole) {
blackhole.consume(LONG_STRING_SEVERAL_MATCHES.replace("'", "''"));
}
@Benchmark
public void testStringUtilsReplaceShortStringNoMatch(Blackhole blackhole) {
blackhole.consume(StringUtils.replace(SHORT_STRING_NO_MATCH, "'", "''"));
}
@Benchmark
public void testStringUtilsReplaceLongStringNoMatch(Blackhole blackhole) {
blackhole.consume(StringUtils.replace(LONG_STRING_NO_MATCH, "'", "''"));
}
@Benchmark
public void testStringUtilsReplaceShortStringOneMatch(Blackhole blackhole) {
blackhole.consume(StringUtils.replace(SHORT_STRING_ONE_MATCH, "'", "''"));
}
@Benchmark
public void testStringUtilsReplaceLongStringOneMatch(Blackhole blackhole) {
blackhole.consume(StringUtils.replace(LONG_STRING_ONE_MATCH, "'", "''"));
}
@Benchmark
public void testStringUtilsReplaceShortStringSeveralMatches(Blackhole blackhole) {
blackhole.consume(StringUtils.replace(SHORT_STRING_SEVERAL_MATCHES, "'", "''"));
}
@Benchmark
public void testStringUtilsReplaceLongStringSeveralMatches(Blackhole blackhole) {
blackhole.consume(StringUtils.replace(LONG_STRING_SEVERAL_MATCHES, "'", "''"));
}
}
Notice that I tried to run 2 x 3 different string replacement scenarios:
- The string is “short”
- The string is “long”
Cross joining (there, finally some SQL in this post!) the above with:
- No match is found
- One match is found
- Several matches are found
That’s important because different optimisations can be implemented for those different cases, and probably, in jOOQ’s case, there is mostly no match in this particular case.
I ran this benchmark once on Java 8:
$ java -version
java version "1.8.0_141"
Java(TM) SE Runtime Environment (build 1.8.0_141-b15)
Java HotSpot(TM) 64-Bit Server VM (build 25.141-b15, mixed mode)
And on Java 9:
$ java -version
java version "9"
Java(TM) SE Runtime Environment (build 9+181)
Java HotSpot(TM) 64-Bit Server VM (build 9+181, mixed mode)
As Tagir Valeev was kind enough to remind me that this issue was supposed to be fixed in Java 9:
The results are:
Java 8
testStringReplaceLongStringNoMatch thrpt 21 4809343.940 ▒ 66443.628 ops/s
testStringUtilsReplaceLongStringNoMatch thrpt 21 25063493.793 ▒ 660657.256 ops/s
testStringReplaceLongStringOneMatch thrpt 21 1406989.855 ▒ 43051.008 ops/s
testStringUtilsReplaceLongStringOneMatch thrpt 21 6961669.111 ▒ 141504.827 ops/s
testStringReplaceLongStringSeveralMatches thrpt 21 1103323.491 ▒ 17047.449 ops/s
testStringUtilsReplaceLongStringSeveralMatches thrpt 21 3899108.777 ▒ 41854.636 ops/s
testStringReplaceShortStringNoMatch thrpt 21 5936992.874 ▒ 68115.030 ops/s
testStringUtilsReplaceShortStringNoMatch thrpt 21 171660973.829 ▒ 377711.864 ops/s
testStringReplaceShortStringOneMatch thrpt 21 3267435.957 ▒ 240198.763 ops/s
testStringUtilsReplaceShortStringOneMatch thrpt 21 9943846.428 ▒ 270821.641 ops/s
testStringReplaceShortStringSeveralMatches thrpt 21 2313713.015 ▒ 28806.738 ops/s
testStringUtilsReplaceShortStringSeveralMatches thrpt 21 5447065.933 ▒ 139525.472 ops/s
As can be seen, the difference is “catastrophic”. Apache Commons Lang’s
StringUtils
drastically outpeforms the JDK’s
String.replace()
in every discipline,
especially when no match is found in a short string! That’s because the library optimises for this particular case:
...
int end = searchText.indexOf(searchString, start);
if (end == INDEX_NOT_FOUND) {
return text;
}
Java 9
Things look a bit differently for Java 9:
testStringReplaceLongStringNoMatch thrpt 21 55528132.674 ▒ 479721.812 ops/s
testStringUtilsReplaceLongStringNoMatch thrpt 21 55767541.806 ▒ 754862.755 ops/s
testStringReplaceLongStringOneMatch thrpt 21 4806322.839 ▒ 217538.714 ops/s
testStringUtilsReplaceLongStringOneMatch thrpt 21 8366539.616 ▒ 142757.888 ops/s
testStringReplaceLongStringSeveralMatches thrpt 21 2685134.029 ▒ 78108.171 ops/s
testStringUtilsReplaceLongStringSeveralMatches thrpt 21 3923819.576 ▒ 351103.020 ops/s
testStringReplaceShortStringNoMatch thrpt 21 122398496.629 ▒ 1350086.256 ops/s
testStringUtilsReplaceShortStringNoMatch thrpt 21 121139633.453 ▒ 2756892.669 ops/s
testStringReplaceShortStringOneMatch thrpt 21 18070522.151 ▒ 498663.835 ops/s
testStringUtilsReplaceShortStringOneMatch thrpt 21 11367395.622 ▒ 153377.552 ops/s
testStringReplaceShortStringSeveralMatches thrpt 21 7548407.681 ▒ 168950.209 ops/s
testStringUtilsReplaceShortStringSeveralMatches thrpt 21 5045065.948 ▒ 175251.545 ops/s
Java 9’s implementation is now similar to that of Apache Commons, with the same optimisation for non-matches:
public String replace(CharSequence target, CharSequence replacement) {
String tgtStr = target.toString();
String replStr = replacement.toString();
int j = indexOf(tgtStr);
if (j < 0) {
return this;
}
...
It is still quite slower for matches in long strings, but faster for matches in short strings. The tradeoff for jOOQ will be to still prefer Apache Commons because:
- Most people are still on Java 8 or less, currently
- Most replacements won’t match and both implementations fare equally well for that in Java 9, but Apache Commons is much faster for this category in Java 8
- If there’s a match and thus a replacement, the speed depends on the string length, where the faster implementation is currently undecided
Conclusion
This micro optimisation stuff matters in jOOQ because jOOQ is a library that does a lot of SQL string manipulation. Every allocation and every CPU cycle that is wasted when manipulating SQL strings slows down the library, and thus impacts all of its users. In a situation like this, it is definitely worth considering not using these useful JDK
String
methods, and opting for the much faster Apache Commons implementations instead.
Things have improved a lot in Java 9, in case of which this can mostly be ignored. But if you still need to support Java 8 (we still support Java 6 in our commercial distributions!), then this has to be considered.
Like this:
Like Loading...
Good stuff! If you use those specific replacements as frequently as I imagine, I’d be tempted to see if hand coding them couldn’t improve further on StringUtil,replace, especially in the case of looking for a single character.
Interesting thought. The current StringUtils API only offers methods that replace Strings by Strings, or individual chars by chars, not chars by Strings.
Also, the out-of-the-box API has features we don’t need, such as:
I’ll have to run another benchmark session first, though. I’m not sure if the additional gain will be as high as this one, where tons of GC pressure could be avoided. In any case, I’ve registered an issue for this: https://github.com/jOOQ/jOOQ/issues/6689. Thanks for the great idea!
In that case, keep in mind that indexOf(char) may perform worse than indexOf(String) with a single char string. E.g. Tagir Valeev has made some excellent benchmarks here: https://stackoverflow.com/a/33907329/1413133
That’s because at least in Java 8 Hotspot uses an intrisnic for indexOf(String) but not for indexOf(char). There is an unofficial list of Hotspot intrisnics here : https://gist.github.com/apangin/7a9b7062a4bd0cd41fcc
Egh, those intrinsics. Would be great to have them annotated in the JDK directly…
They are in 9: @HotSpotIntrinsicCandidate
Thanks for your comment. Yes you’re right!
Nice analysis! (And not even a hint of controversy… are you feeling OK? :-) :-) )
Yeah, I was going to jab at “ORMs” (guess which ones) for not supporting the MySQL backslash escaping feature, leading to undiscovered, lurking SQLi vulnerabilities, but then I thought better not distract from the main topic :)
StringUtils.replace is bloated with useless substring String copies:
* https://github.com/apache/commons-lang/blob/master/src/main/java/org/apache/commons/lang3/StringUtils.java#L5518
* https://github.com/apache/commons-lang/blob/master/src/main/java/org/apache/commons/lang3/StringUtils.java#L5525
It should be using https://docs.oracle.com/javase/8/docs/api/java/lang/StringBuilder.html#append-java.lang.CharSequence-int-int-
Here’s the impact on my side:
Benchmark Mode Cnt Score Error Units
StringReplaceBenchmark.testFastStringReplaceLongStringOneMatch thrpt 21 7546534,219 ± 145523,962 ops/s
StringReplaceBenchmark.testStringUtilsReplaceLongStringOneMatch thrpt 21 7353512,552 ± 124498,228 ops/s
StringReplaceBenchmark.testFastStringReplaceLongStringSeveralMatches thrpt 21 5077255,810 ± 62358,937 ops/s
StringReplaceBenchmark.testStringUtilsReplaceLongStringSeveralMatches thrpt 21 4108357,612 ± 92909,038 ops/s
StringReplaceBenchmark.testFastStringReplaceShortStringOneMatch thrpt 21 15911221,949 ± 541064,693 ops/s
StringReplaceBenchmark.testStringUtilsReplaceShortStringOneMatch thrpt 21 10677897,475 ± 491091,973 ops/s
StringReplaceBenchmark.testFastStringReplaceShortStringSeveralMatches thrpt 21 9271742,251 ± 220150,121 ops/s
StringReplaceBenchmark.testStringUtilsReplaceShortStringSeveralMatches thrpt 21 6158829,188 ± 99637,607 ops/s
I guess I’ll send a PR 😉
Excellent finding – I forgot about this. In the old days, substring() didn’t produce a copy but reused the original char[] or byte[] (depending on how the string was stored, at the time), if I’m not mistaken. I guess yesterday’s rules are today’s myths with java.lang.String :)
Right! Then String#substring has been copying since Java 7, so it’s time to roll such change and hunt substrings down.
Thanks, Lukas, this example also highlights why we should update Java version, even if you don’t intend to use new features. These kind of benefits are as much valuable as new features.
Well, there’s a flip side too. Suddenly, String.substring() may be a lot slower for certain usages… :)
Update: My bad, it was changed in Java 7 already…