Java on Steroids: 5 Super Useful JIT Optimization Techniques

Datetime:2016-08-22 22:54:17          Topic: Assembler  JVM           Share

What are some of the most useful JVM JIT optimizations and how to use them?

Even when you’re not actively planning for it, the JVM has quite a few tricks up its sleeve to help your code perform better. Some require literally nothing from you for them to work, and others could use a little help along the way.

For this post, we’ve had a chat with Moshe Tsur ,Takipi’s R&D Team Lead, and got the opportunity to share some of his tips about the JVM’s Just In Time (JIT) compiler.

Let’s see what’s going on behind the scenes.

Write once, run anywhere, optimize just in time

Most people know that their Java source code is turned to bytecode by the javac compiler, and then run by the JVM which then compiles it to Assembly and feeds it to the CPU.

Less people know that the rabbit hole goes deeper. The JVM has 2 different operation modes.

The generated bytecode is an accurate representation of the original Java source code, without any optimizations. When the JVM translates it to Assembly, things get trickier and the magic kicks in:

  1. Interpreted mode – Where the JVM reads and runs the bytecode itself, as is.
  2. Compiled mode (bytecode to assembly) – Where the JVM loosens its grip and there’s no bytecode involved.

The link between the two is the JIT compiler. As you might have already guessed, interpreted mode is much more slower than directly running on the CPU without a middleman.

Interpreted mode gives the Java platform a precious opportunity to collect information about the code and how it actually behaves in practice – giving it a chance to learn how it can optimize the resulting Assembly code.

After enough running time, the JIT compiler is able to perform the optimization in the Assembly level. Mainly, minimizing unnecessary jumps that add significant overhead on for the application, based on the actual behavior of the code. Furthermore, this process doesn’t stop after the “warmup phase” of the application and can happen multiple times to optimize the Assembly even further.

AtTakipi, where we’re building a Java agent that monitors servers in production, we take overhead very seriously. Every little bit of code is optimized and along the way we’ve had the chance to use and learn about some cool JIT features.

Here are 5 of the more useful examples:

1. Null Check Elimination

In many cases, adding certain conditional checks into the code makes a lot of sense during development. Just like the infamous check for null. It’s no surprise that nulls get special treatment, since it’s their the cause for the top exception that happens in production .

Nonetheless, explicit null checks can be eliminated from optimized Assembly code in most cases, and it holds true for any kind of condition that happens rarely. In the rare case that the condition does happen, the JIT uses a mechanism called “uncommon trap” that lets it know that it need to go back and fix the optimization by pulling back in the set of instructions that needs to be carried out when it happens – without surfacing an actual NullPointerException.

The reason these checks are a cause for concern for optimized code is that they’re likely to introduce jumps into the assembly code, slowing down its execution.

Let’s take a look at a simple example:

If the JIT sees that graph is never called with null, the compiled version would look as if it reflects code that was written without the null check:

Bottom line:We don’t need to do anything special to enjoy this optimization, and when a rare condition does kick in, the JVM would know to “deoptimize” and get the desired result.

2. Branch Prediction

Similarly to Null Check Elimination, there’s another JIT optimization technique that helps it decide whether certain lines of code are “ hotter ” than others and happen more often.

We already know that if a certain condition rarely or never holds, it’s likely that the “uncommon trap” mechanism will kick in and eliminate it from the compiled Assembly.

If there’s a different balance, both branches of an IF condition hold, but one happens more than the other, the JIT compiler can reorder them according to the most common one and significantly reduce the number of Assembly jumps.

Here’s how it works in practice:

Now, assuming that most of the time x < y, the condition will be flipped to statistically reduce the number of Assembly jumps:

In case you’re not convinced, we’ve actually ran it through a little test and pulled out the optimized Assembly code:

As you can see, the jump instruction is jge (jump if greater or equals), corresponding with the else branch of the condition.

Bottom line:Another out-of-the-box JIT optimization that we get to enjoy. We’ve also covered Branch Prediction in a recent post about some of the most interesting Stackoverflow Java answers .

In the example shown there, we see how branch prediction is used to explain why running operations on a sorted array is much faster under certain conditions that help branch prediction to kick in.

3. Loop Unrolling

As you might have noticed, the JIT compiler constantly tries to eliminate Assembly jumps from the compiled code.

This is why loops smell like trouble.

Each iteration is actually an Assembly jump back to the beginning of the instruction set. With loop unrolling, the JIT compiler opens up the loop and just repeats the corresponding Assembly instructions one after another.

Theoretically, javac could do something similar when translating Java to bytecode, but the JIT compiler has better knowledge of the actual CPU that the code will be running on and knows how to fine tune the code in a much more efficient way.

For example, let’s take a look at a method that multiplies a matrix by a vector:

The unrolled version would look like this:

Repeating the same action over and over again without the jump overhead:

Bottom line:The JIT compiler does a great job in unrolling loops, as long as you keep their contents simple without any unnecessary complications.

It’s also not recommended to try to optimize this process and unroll them yourself, the outcome will probably cause more problems than it solves.

4. Inlining Methods

Next up, another jump killer optimization. A huge source for Assembly jumps are in fact method calls. When possible, the JIT compiler would try to inline them and eliminate the jumps of going there and back, the need to send arguments, and returning a value – transferring its whole content to the calling method.

It’s also possible to fine tune the way the JIT compiler inlines methods with 2 JVM arguments:

  1. -XX:MaxInlineSize – The maximum size of the bytecode of a method that can be inlined (done for any method, even if not being executed frequently). Default is about ~35 bytes.
  2. -XX:FreqInlineSize – The maximum size of the bytecode of a method that is considered a hot spot (executed frequently), that should be inlined. Default varies between platforms.

For example, let’s take a look at a method that calculated the coordinates of a simple line:

The optimized inlined version would eliminate the jump, the send of the arguments l and x, and the returning of y:

Bottom line:Inlining is a super useful optimization but it will only kick in if you keep your methods as simple as possible with a minimal number of lines. Complex methods are less likely to get inlined, so this is one point where you can help the JIT compiler do its job.

5. Thread fields and Thread Local Storage (TLS)

As it turns out, thread fields are much faster than regular variables. The thread object is stored on the actual CPU register, making its fields a very efficient storage space.

With thread local storage, you’re able to create variables stored on the Thread object.

Here’s how you can access the Thread Local Storage, with a simple example of a request counter:

In the corresponding Assembly code, we can see that the data is placed directly on the register, allowing for faster data access than, say, a static class variable:

Bottom line:Certain sensitive data types might be better stored on Thread Local Storage for faster access and retrieval.

Final Thoughts

The JVMs JIT compiler is one of the fascinating mechanisms on the Java platform. It optimizes your code for performance, without giving away its readability. Not only that, beyond the “static” optimization methods of inlining, it also makes decisions based on the way that the code performs in practice.

We hope you’ve enjoyed learning about some of the JVMs most useful optimization techniques, and would be happy to hear your thoughts in the comments section below.





About List