Benchmarking JDK String.replace() vs Apache Commons StringUtils.replace()

What’s better? Using the JDK’s String.replace() or something like Apache Commons Lang’s Apache Commons Lang’s StringUtils.replace()? In this article, I’ll compare the two, first in a profiling session using Java Mission Control (JMC), then in a benchmark using JMH, and we’ll see that Java 9 heavily improved things in this area.

Profiling using JMC

In a recent profiling session where I checked for any “obvious” bottlenecks in jOOQ, I’ve discovered this nasty regular expression pattern instantiation:

Tons of int[] instances were allocated by a regular expression pattern. That’s weird, because in general, inside of jOOQ’s internals, special care is always taken to pre-compile any regular expressions that are needed in static members, e.g.:


private static final Pattern TYPE_NAME_PATTERN = 
  Pattern.compile(&quot;\\([^\\)]*\\)&quot;);

This allows for using the Pattern in a far more optimal way, than e.g. by using String.replaceAll():


// Much better, pattern is pre-compiled
TYPE_NAME_PATTERN.matcher(castTypeName).replaceAll(&quot;&quot;)

// Much worse, pattern is compiled *every time*
castTypeName.replaceAll(&quot;\\([^\\)]*\\)&quot;, &quot;&quot;)

That should be clear to everyone. The price to pay for this is the fact that the pattern is stored “far away” in some static member, rather than being visible right where it is used, which is a bit less readable. At least in my opinion.

SIDENOTE: People tend to get all angry about premature optimisation and such. Yes, these optimisations are micro optimisations and aren’t always worth the trouble. But this article is about jOOQ, a library that does a lot of expression tree transformations, and it is important for jOOQ to eliminate even 1% “bottlenecks”, as they make a difference. So, please read this article in this context. Consider also our previous post about this subject: Top 10 Easy Performance Optimisations in Java

What was the problem in jOOQ? Now, what appears to be obvious when using regular expressions seems less obvious when using ordinary, constant string replacements, such as when calling String.replace(CharSequence), as was done in the linked jOOQ issue #6672. The relevant piece of code was escaping all inline strings that are sent to the SQL database, to prevent syntax errors and, of course, SQL injection:


static final String escape(Object val, Context&lt;?&gt; context) {
    String result = val.toString();

    if (needsBackslashEscaping(context.configuration()))
        result = result.replace(&quot;\\&quot;, &quot;\\\\&quot;);

    return result.replace(&quot;'&quot;, &quot;''&quot;);
}

We’re always escaping apostrophes by doubling them, and in some databases (e.g. MySQL), we often have to escape backslashes as well (unfortunately, not all ORMs seem to do this or even be aware of this MySQL “feature”). Unfortunately as well, despite heavy use of Apache Commons Lang’s StringUtils.replace() in jOOQ’s internals, every now and then a String.replace(CharSequence) sneaks in, because it’s just so convenient to write. Meh, does it matter? Usually, in ordinary business logic, it shouldn’t (again – don’t optimise prematurely), but in jOOQ, which is essentially a SQL string manipulation library, it can get quite costly if a single replace call is done excessively (for good reasons, of course), and it is slower than it should be. And it is, prior to Java 9, when this method was optimised. I’ve done the profiling with Java 8, where internally, String.replace() uses a literal regex pattern (i.e. a pattern with a “literal” flag that is faster, but it is a pattern, nonetheless). Not only does the method appear as a major offender in the GC allocation view, it also triggers quite some action in the “hot methods” view of JMC:

Those are quite a few Pattern methods. The percentages have to be understood in the context of a benchmark, running millions of queries against an H2 in-memory database, so the overhead is significant! Using Apache Commons Lang’s StringUtils A simple fix is to use Apache Commons Lang’s StringUtils instead:


static final String escape(Object val, Context&lt;?&gt; context) {
    String result = val.toString();

    if (needsBackslashEscaping(context.configuration()))
        result = StringUtils.replace(result, &quot;\\&quot;, &quot;\\\\&quot;);

    return StringUtils.replace(result, &quot;'&quot;, &quot;''&quot;);
}

Now, the pressure has changed significantly. The int[] allocation is barely noticeable in comparison:

And much fewer Pattern calls are made, overall.

Benchmarking using JMH

Profiling can be very useful to spot bottlenecks, but it needs to be read with care. It introduces some artefacts and slight overheads and it is not 100% accurate when sampling call stacks, which might lead the wrong conclusions at times. This is why it is sometimes important to back claims by running an actual benchmark. And when benchmarking, please, don’t just loop 1 million times in a main() method. That will be very very inaccurate, except for very obvious, order-of-magnitude scale differences. I’m using JMH here, running the following simple benchmark:


package org.jooq.test.benchmark;

import org.apache.commons.lang3.StringUtils;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Warmup;
import org.openjdk.jmh.infra.Blackhole;

@Fork(value = 3, jvmArgsAppend = &quot;-Djmh.stack.lines=3&quot;)
@Warmup(iterations = 5)
@Measurement(iterations = 7)
public class StringReplaceBenchmark {

    private static final String SHORT_STRING_NO_MATCH = &quot;abc&quot;;
    private static final String SHORT_STRING_ONE_MATCH = &quot;a'bc&quot;;
    private static final String SHORT_STRING_SEVERAL_MATCHES = &quot;'a'b'c'&quot;;
    private static final String LONG_STRING_NO_MATCH = 
      &quot;abcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabc&quot;;
    private static final String LONG_STRING_ONE_MATCH = 
      &quot;abcabcabcabcabcabcabcabcabcabcabca'bcabcabcabcabcabcabcabcabcabcabcabcabc&quot;;
    private static final String LONG_STRING_SEVERAL_MATCHES = 
      &quot;abcabca'bcabcabcabcabcabc'abcabcabca'bcabcabcabcabcabca'bcabcabcabcabcabcabc&quot;;

    @Benchmark
    public void testStringReplaceShortStringNoMatch(Blackhole blackhole) {
        blackhole.consume(SHORT_STRING_NO_MATCH.replace(&quot;'&quot;, &quot;''&quot;));
    }

    @Benchmark
    public void testStringReplaceLongStringNoMatch(Blackhole blackhole) {
        blackhole.consume(LONG_STRING_NO_MATCH.replace(&quot;'&quot;, &quot;''&quot;));
    }

    @Benchmark
    public void testStringReplaceShortStringOneMatch(Blackhole blackhole) {
        blackhole.consume(SHORT_STRING_ONE_MATCH.replace(&quot;'&quot;, &quot;''&quot;));
    }

    @Benchmark
    public void testStringReplaceLongStringOneMatch(Blackhole blackhole) {
        blackhole.consume(LONG_STRING_ONE_MATCH.replace(&quot;'&quot;, &quot;''&quot;));
    }

    @Benchmark
    public void testStringReplaceShortStringSeveralMatches(Blackhole blackhole) {
        blackhole.consume(SHORT_STRING_SEVERAL_MATCHES.replace(&quot;'&quot;, &quot;''&quot;));
    }

    @Benchmark
    public void testStringReplaceLongStringSeveralMatches(Blackhole blackhole) {
        blackhole.consume(LONG_STRING_SEVERAL_MATCHES.replace(&quot;'&quot;, &quot;''&quot;));
    }

    @Benchmark
    public void testStringUtilsReplaceShortStringNoMatch(Blackhole blackhole) {
        blackhole.consume(StringUtils.replace(SHORT_STRING_NO_MATCH, &quot;'&quot;, &quot;''&quot;));
    }

    @Benchmark
    public void testStringUtilsReplaceLongStringNoMatch(Blackhole blackhole) {
        blackhole.consume(StringUtils.replace(LONG_STRING_NO_MATCH, &quot;'&quot;, &quot;''&quot;));
    }

    @Benchmark
    public void testStringUtilsReplaceShortStringOneMatch(Blackhole blackhole) {
        blackhole.consume(StringUtils.replace(SHORT_STRING_ONE_MATCH, &quot;'&quot;, &quot;''&quot;));
    }

    @Benchmark
    public void testStringUtilsReplaceLongStringOneMatch(Blackhole blackhole) {
        blackhole.consume(StringUtils.replace(LONG_STRING_ONE_MATCH, &quot;'&quot;, &quot;''&quot;));
    }

    @Benchmark
    public void testStringUtilsReplaceShortStringSeveralMatches(Blackhole blackhole) {
        blackhole.consume(StringUtils.replace(SHORT_STRING_SEVERAL_MATCHES, &quot;'&quot;, &quot;''&quot;));
    }

    @Benchmark
    public void testStringUtilsReplaceLongStringSeveralMatches(Blackhole blackhole) {
        blackhole.consume(StringUtils.replace(LONG_STRING_SEVERAL_MATCHES, &quot;'&quot;, &quot;''&quot;));
    }
}

Notice that I tried to run 2 x 3 different string replacement scenarios:

The string is “short”
The string is “long”

Cross joining (there, finally some SQL in this post!) the above with:

No match is found
One match is found
Several matches are found

That’s important because different optimisations can be implemented for those different cases, and probably, in jOOQ’s case, there is mostly no match in this particular case. I ran this benchmark once on Java 8:

$ java -version
java version "1.8.0_141"
Java(TM) SE Runtime Environment (build 1.8.0_141-b15)
Java HotSpot(TM) 64-Bit Server VM (build 25.141-b15, mixed mode)

And on Java 9:

$ java -version
java version "9"
Java(TM) SE Runtime Environment (build 9+181)
Java HotSpot(TM) 64-Bit Server VM (build 9+181, mixed mode)

As Tagir Valeev was kind enough to remind me that this issue was supposed to be fixed in Java 9:

They fixed this in Java 9. Just update, don't use outdated versions of Java!
— Tagir Valeev (@tagir_valeev) October 10, 2017

The results are: Java 8

testStringReplaceLongStringNoMatch               thrpt   21    4809343.940 ▒  66443.628  ops/s
testStringUtilsReplaceLongStringNoMatch          thrpt   21   25063493.793 ▒ 660657.256  ops/s

testStringReplaceLongStringOneMatch              thrpt   21    1406989.855 ▒  43051.008  ops/s
testStringUtilsReplaceLongStringOneMatch         thrpt   21    6961669.111 ▒ 141504.827  ops/s

testStringReplaceLongStringSeveralMatches        thrpt   21    1103323.491 ▒  17047.449  ops/s
testStringUtilsReplaceLongStringSeveralMatches   thrpt   21    3899108.777 ▒  41854.636  ops/s

testStringReplaceShortStringNoMatch              thrpt   21    5936992.874 ▒  68115.030  ops/s
testStringUtilsReplaceShortStringNoMatch         thrpt   21  171660973.829 ▒ 377711.864  ops/s

testStringReplaceShortStringOneMatch             thrpt   21    3267435.957 ▒ 240198.763  ops/s
testStringUtilsReplaceShortStringOneMatch        thrpt   21    9943846.428 ▒ 270821.641  ops/s

testStringReplaceShortStringSeveralMatches       thrpt   21    2313713.015 ▒  28806.738  ops/s
testStringUtilsReplaceShortStringSeveralMatches  thrpt   21    5447065.933 ▒ 139525.472  ops/s

As can be seen, the difference is “catastrophic”. Apache Commons Lang’s StringUtils drastically outpeforms the JDK’s String.replace() in every discipline, especially when no match is found in a short string! That’s because the library optimises for this particular case:


...
int end = searchText.indexOf(searchString, start);
if (end == INDEX_NOT_FOUND) {
    return text;
}

Java 9 Things look a bit differently for Java 9:

testStringReplaceLongStringNoMatch               thrpt   21   55528132.674 ▒  479721.812  ops/s
testStringUtilsReplaceLongStringNoMatch          thrpt   21   55767541.806 ▒  754862.755  ops/s

testStringReplaceLongStringOneMatch              thrpt   21    4806322.839 ▒  217538.714  ops/s
testStringUtilsReplaceLongStringOneMatch         thrpt   21    8366539.616 ▒  142757.888  ops/s

testStringReplaceLongStringSeveralMatches        thrpt   21    2685134.029 ▒   78108.171  ops/s
testStringUtilsReplaceLongStringSeveralMatches   thrpt   21    3923819.576 ▒  351103.020  ops/s

testStringReplaceShortStringNoMatch              thrpt   21  122398496.629 ▒ 1350086.256  ops/s
testStringUtilsReplaceShortStringNoMatch         thrpt   21  121139633.453 ▒ 2756892.669  ops/s

testStringReplaceShortStringOneMatch             thrpt   21   18070522.151 ▒  498663.835  ops/s
testStringUtilsReplaceShortStringOneMatch        thrpt   21   11367395.622 ▒  153377.552  ops/s

testStringReplaceShortStringSeveralMatches       thrpt   21    7548407.681 ▒  168950.209  ops/s
testStringUtilsReplaceShortStringSeveralMatches  thrpt   21    5045065.948 ▒  175251.545  ops/s

Java 9’s implementation is now similar to that of Apache Commons, with the same optimisation for non-matches:


public String replace(CharSequence target, CharSequence replacement) {
    String tgtStr = target.toString();
    String replStr = replacement.toString();
    int j = indexOf(tgtStr);
    if (j &lt; 0) {
        return this;
    }
    ...

It is still quite slower for matches in long strings, but faster for matches in short strings. The tradeoff for jOOQ will be to still prefer Apache Commons because:

Most people are still on Java 8 or less, currently
Most replacements won’t match and both implementations fare equally well for that in Java 9, but Apache Commons is much faster for this category in Java 8
If there’s a match and thus a replacement, the speed depends on the string length, where the faster implementation is currently undecided

Conclusion

This micro optimisation stuff matters in jOOQ because jOOQ is a library that does a lot of SQL string manipulation. Every allocation and every CPU cycle that is wasted when manipulating SQL strings slows down the library, and thus impacts all of its users. In a situation like this, it is definitely worth considering not using these useful JDK String methods, and opting for the much faster Apache Commons implementations instead. Things have improved a lot in Java 9, in case of which this can mostly be ignored. But if you still need to support Java 8 (we still support Java 6 in our commercial distributions!), then this has to be considered.

13 thoughts on “Benchmarking JDK String.replace() vs Apache Commons StringUtils.replace()”

ericjs says:

October 11, 2017 at 16:28

Good stuff! If you use those specific replacements as frequently as I imagine, I’d be tempted to see if hand coding them couldn’t improve further on StringUtil,replace, especially in the case of looking for a single character.

Loading...

1. lukaseder says:
  
  October 11, 2017 at 16:52
  Interesting thought. The current StringUtils API only offers methods that replace Strings by Strings, or individual chars by chars, not chars by Strings.
  
  Also, the out-of-the-box API has features we don’t need, such as:
  - Opt-in case-insensitive replacements
  - An optimisation for replacements on empty strings
  I’ll have to run another benchmark session first, though. I’m not sure if the additional gain will be as high as this one, where tons of GC pressure could be avoided. In any case, I’ve registered an issue for this: https://github.com/jOOQ/jOOQ/issues/6689. Thanks for the great idea!
  
  Loading...
  1. Manos Nikolaidis says:
    
    October 12, 2017 at 14:31
    
    In that case, keep in mind that indexOf(char) may perform worse than indexOf(String) with a single char string. E.g. Tagir Valeev has made some excellent benchmarks here: https://stackoverflow.com/a/33907329/1413133
    
    That’s because at least in Java 8 Hotspot uses an intrisnic for indexOf(String) but not for indexOf(char). There is an unofficial list of Hotspot intrisnics here : https://gist.github.com/apangin/7a9b7062a4bd0cd41fcc
    
    Loading...
    
    1. lukaseder says:
      
      October 12, 2017 at 14:42
      
      Egh, those intrinsics. Would be great to have them annotated in the JDK directly…
      
      Loading...
      
      1. eugenrabii says:
        
        October 12, 2017 at 23:19
        
        They are in 9: @HotSpotIntrinsicCandidate
        
        Loading...
        
        
        lukaseder says:
        
        October 15, 2017 at 20:12
        
        Thanks for your comment. Yes you’re right!
        
        Loading...
        
Charles Roth says:

October 11, 2017 at 18:44

Nice analysis! (And not even a hint of controversy… are you feeling OK? :-) :-) )

Loading...

1. lukaseder says:
  
  October 11, 2017 at 20:02
  
  Yeah, I was going to jab at “ORMs” (guess which ones) for not supporting the MySQL backslash escaping feature, leading to undiscovered, lurking SQLi vulnerabilities, but then I thought better not distract from the main topic :)
  
  Loading...
  
Stéphane LANDELLE says:

October 12, 2017 at 10:33

StringUtils.replace is bloated with useless substring String copies:
* https://github.com/apache/commons-lang/blob/master/src/main/java/org/apache/commons/lang3/StringUtils.java#L5518
* https://github.com/apache/commons-lang/blob/master/src/main/java/org/apache/commons/lang3/StringUtils.java#L5525

It should be using https://docs.oracle.com/javase/8/docs/api/java/lang/StringBuilder.html#append-java.lang.CharSequence-int-int-

Here’s the impact on my side:

Benchmark Mode Cnt Score Error Units
StringReplaceBenchmark.testFastStringReplaceLongStringOneMatch thrpt 21 7546534,219 ± 145523,962 ops/s
StringReplaceBenchmark.testStringUtilsReplaceLongStringOneMatch thrpt 21 7353512,552 ± 124498,228 ops/s

StringReplaceBenchmark.testFastStringReplaceLongStringSeveralMatches thrpt 21 5077255,810 ± 62358,937 ops/s
StringReplaceBenchmark.testStringUtilsReplaceLongStringSeveralMatches thrpt 21 4108357,612 ± 92909,038 ops/s

StringReplaceBenchmark.testFastStringReplaceShortStringOneMatch thrpt 21 15911221,949 ± 541064,693 ops/s
StringReplaceBenchmark.testStringUtilsReplaceShortStringOneMatch thrpt 21 10677897,475 ± 491091,973 ops/s

StringReplaceBenchmark.testFastStringReplaceShortStringSeveralMatches thrpt 21 9271742,251 ± 220150,121 ops/s
StringReplaceBenchmark.testStringUtilsReplaceShortStringSeveralMatches thrpt 21 6158829,188 ± 99637,607 ops/s

I guess I’ll send a PR 😉

Loading...

1. lukaseder says:
  
  October 12, 2017 at 10:36
  
  Excellent finding – I forgot about this. In the old days, substring() didn’t produce a copy but reused the original char[] or byte[] (depending on how the string was stored, at the time), if I’m not mistaken. I guess yesterday’s rules are today’s myths with java.lang.String :)
  
  Loading...
  
  1. Stéphane LANDELLE says:
    
    October 12, 2017 at 10:53
    
    Right! Then String#substring has been copying since Java 7, so it’s time to roll such change and hunt substrings down.
    
    Loading...
    
javinpaul (@javinpaul) says:

October 12, 2017 at 16:57

Thanks, Lukas, this example also highlights why we should update Java version, even if you don’t intend to use new features. These kind of benefits are as much valuable as new features.

Loading...

1. lukaseder says:
  
  October 12, 2017 at 16:59
  
  Well, there’s a flip side too. Suddenly, String.substring() may be a lot slower for certain usages… :)
  
  Update: My bad, it was changed in Java 7 already…
  
  Loading...

Benchmarking JDK String.replace() vs Apache Commons StringUtils.replace()

Profiling using JMC

Benchmarking using JMH

Conclusion

Like this:

Published by lukaseder

13 thoughts on “Benchmarking JDK String.replace() vs Apache Commons StringUtils.replace()”

Leave a Reply to Stéphane LANDELLECancel reply

Profiling using JMC

Benchmarking using JMH

Conclusion

Like this:

Published by lukaseder

13 thoughts on “Benchmarking JDK String.replace() vs Apache Commons StringUtils.replace()”

Leave a Reply to Stéphane LANDELLECancel reply

Discover more from Java, SQL and jOOQ.