The Java JIT Compiler is Darn Good at Optimization

“Challenge accepted” said Tagir Valeev when I recently asked the readers of the jOOQ blog to show if the Java JIT (Just-In-Time compilation) can optimise away a for loop.

Tagir is the author of StreamEx, very useful Java 8 Stream extension library that adds additional parallelism features on top of standard streams. He’s a speaker at conferences, and has contributed a dozen of patches into OpenJDK Stream API (including bug fixes, performance optimizations and new features). He’s interested in static code analysis and works on a new Java bytecode analyzer.

I’m very happy to publish Tagir’s guest post here on the jOOQ blog.

tagir-valeev

The Java JIT Compiler

In recent article Lukas wondered whether JIT could optimize a code like this to remove an unnecessary iteration:

// ... than this, where we "know" the list
// only contains one value
for (Object object : Collections.singletonList("abc")) {
    doSomethingWith(object);
}

Here’s my answer: JIT can do even better. Let’s consider this simple method which calculates total length of all the strings of supplied list:

static int testIterator(List<String> list) {
    int sum = 0;
    for (String s : list) {
        sum += s.length();
    }
    return sum;
}

As you might know this code is equivalent to the following:

static int testIterator(List<String> list) {
    int sum = 0;
    Iterator<String> it = list.iterator();
    while(it.hasNext()) {
        String s = it.next();
        sum += s.length();
    }
    return sum;
}

Of course in general case the list could be anything, so when creating an iterator, calling hasNext and next methods JIT must emit honest virtual calls which is not very fast. However what will happen if you always supply the singletonList here? Let’s create some simple test:

public class Test {
    static int res = 0;

    public static void main(String[] args) {
        for (int i = 0; i < 100000; i++) {
            res += testIterator(Collections.singletonList("x"));
        }
        System.out.println(res);
    }
}

We are calling our testIterator in a loop so it’s called enough times to be JIT-compiled with C2 JIT compiler. As you might know, in HotSpot JVM there are two JIT-compilers, namely C1 (client) compiler and C2 (server) compiler. In 64-bit Java 8 they work together. First method is compiled with C1 and special instructions are added to gather some statistics (which is called profiling). Among it there is type statistics. JVM will carefully check which exact types our list variable has. And in our case it will discover that in 100% of cases it’s singleton list and nothing else. When method is called quite often, it gets recompiled by better C2 compiler which can use this information. Thus when C2 compiles it can assume that in future singleton list will also appear quite often.

You may ask JIT compiler to output the assembly generated for methods. To do this you should install hsdis on your system. After that you may use convenient tools like JITWatch or write a JMH benchmark and use -perfasm option. Here we will not use third-party tools and simply launch the JVM with the following command line options:

$ java -XX:+UnlockDiagnosticVMOptions -XX:+PrintCompilation -XX:+PrintAssembly Test >output.txt

This will generate quite huge output which may scare the children. The assembly generated by C2 compiler for ourtestIterator method looks like this (on Intel x64 platform):

  # {method} {0x0000000055120518} 
  # 'testIterator' '(Ljava/util/List;)I' in 'Test'
  # parm0:    rdx:rdx   = 'java/util/List'
  #           [sp+0x20]  (sp of caller)
  0x00000000028e7560: mov    %eax,-0x6000(%rsp)
  0x00000000028e7567: push   %rbp

  ;*synchronization entry
  ; - Test::testIterator@-1 (line 15)
  0x00000000028e7568: sub    $0x10,%rsp         
                                                
  ; implicit exception: dispatches to 0x00000000028e75bd
  0x00000000028e756c: mov    0x8(%rdx),%r10d    

  ;   {metadata('java/util/Collections$SingletonList')}
  0x00000000028e7570: cmp    $0x14d66a20,%r10d  

  ;*synchronization entry
  ; - java.util.Collections::singletonIterator@-1
  ; - java.util.Collections$SingletonList::iterator@4
  ; - Test::testIterator@3 (line 16)
  0x00000000028e7577: jne    0x00000000028e75a0 

  ;*getfield element
  ; - java.util.Collections$SingletonList::iterator@1
  ; - Test::testIterator@3 (line 16)
  0x00000000028e7579: mov    0x10(%rdx),%ebp    

  ; implicit exception: dispatches to 0x00000000028e75c9
  0x00000000028e757c: mov    0x8(%rbp),%r11d    

  ;   {metadata('java/lang/String')}
  0x00000000028e7580: cmp    $0x14d216d0,%r11d  
  0x00000000028e7587: jne    0x00000000028e75b1

  ;*checkcast
  ; - Test::testIterator@24 (line 16)
  0x00000000028e7589: mov    %rbp,%r10          
                                                
  ;*getfield value
  ; - java.lang.String::length@1
  ; - Test::testIterator@30 (line 17)
  0x00000000028e758c: mov    0xc(%r10),%r10d    

  ;*synchronization entry
  ; - Test::testIterator@-1 (line 15)
  ; implicit exception: dispatches to 0x00000000028e75d5
  0x00000000028e7590: mov    0xc(%r10),%eax     
                                                
                                               
  0x00000000028e7594: add    $0x10,%rsp
  0x00000000028e7598: pop    %rbp

  # 0x0000000000130000
  0x00000000028e7599: test   %eax,-0x27b759f(%rip)        
         
  ;   {poll_return}                                       
  0x00000000028e759f: retq   
  ... // slow paths follow

What you can notice is that it’s surpisingly short. I’ll took the liberty to annotate what happens here:

// Standard stack frame: every method has such prolog
mov    %eax,-0x6000(%rsp)
push   %rbp
sub    $0x10,%rsp         
// Load class identificator from list argument (which is stored in rdx 
// register) like list.getClass() This also does implicit null-check: if 
// null is supplied, CPU will trigger a hardware exception. The exception
// will be caught by JVM and translated into NullPointerException
mov    0x8(%rdx),%r10d
// Compare list.getClass() with class ID of Collections$SingletonList class 
// which is constant and known to JIT
cmp    $0x14d66a20,%r10d
// If list is not singleton list, jump out to the slow path
jne    0x00000000028e75a0
// Read Collections$SingletonList.element private field into rbp register
mov    0x10(%rdx),%ebp
// Read its class identificator and check whether it's actually String
mov    0x8(%rbp),%r11d
cmp    $0x14d216d0,%r11d
// Jump out to the exceptional path if not (this will create and throw
// ClassCastException)
jne    0x00000000028e75b1
// Read private field String.value into r10 which is char[] array containing
//  String content
mov    %rbp,%r10
mov    0xc(%r10),%r10d
// Read the array length field into eax register (by default method returns
// its value via eax/rax)
mov    0xc(%r10),%eax
// Standard method epilog
add    $0x10,%rsp
pop    %rbp
// Safe-point check (so JVM can take the control if necessary, for example,
// to perform garbage collection)
test   %eax,-0x27b759f(%rip)
// Return
retq   

If it’s still hard to understand, let’s rewrite it via pseudo-code:

if (list.class != Collections$SingletonList) {
  goto SLOW_PATH;
}
str = ((Collections$SingletonList)list).element;
if (str.class != String) {
  goto EXCEPTIONAL_PATH;
}
return ((String)str).value.length;

So for the hot path we have no iterator allocated and no loop, just several dereferences and two quick checks (which are always false, so CPU branch predictor will predict them nicely). Iterator object is evaporated completely, though originally it has additional bookkeeping like tracking whether it was already called and throwing NoSuchElementException in this case. JIT-compiler statically proved that these parts of code are unnecessary and removed them. The sum variable is also evaporated. Nevertheless the method is correct: if it happens in future that it will be called with something different from singleton list, it will handle this situation on the SLOW_PATH (which is of course much longer). Other cases like list == null or list element is not String are also handled.

What will occur if your program pattern changes? Imagine that at some point you are no longer using singleton lists and pass different list implementations here. When JIT discovers that SLOW_PATH is hit too often, it will recompile the method to remove special handling of singleton list. This is different from pre-compiled applications: JIT can change your code following the behavioral changes of your program.

2 thoughts on “The Java JIT Compiler is Darn Good at Optimization

  1. You mentioned Java 8 has two JIT compilers C1 and C2 then why does system property java.compiler returns null indicating there is no JIT compilers. I am running Java 8 on ubuntu 16.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s