Experiments in program optimisation

Hash functions: disassembling and measuring performance

Story: Conway's Game of Life

Tags: Java optimisation hashmap life assembly

10 Sep 2015

In the previous article we’ve implemented Conway’s game of Life in Java using HashMap as our main building block. We’ve discovered that the performance depends on the choice of a hash function; some of them work much better than the others. The reason is that the keys are not truly random and follow some patterns, which cause hash code collisions. Good functions, even in the presence of patterns, spread values evenly over hash table slots and reduce chain sizes. Shorter lists must be traversed and fewer calls to equals() must be made.

However, hash functions also cost CPU cycles, and this cost may kill any benefit from better hashing. We’ve looked at several hash functions and compared them regarding the quality of hashing. Now it’s time to measure their own speed as well and check if it can be improved. For now, I’ll concentrate on the first task (measuring), for it turned to be non-trivial and full of surprises. We’ll look at improvements in the next article.

Although the article is called “measuring hash functions”, most of our findings don’t depend on the nature of the functions, and are generally applicable to any Java code. I should have investigated this topic before trying any optimisations of Java programs, so this should have been the article number one. But when I started, I didn’t posess a skill, which I’m going to practice today: to print the code generated by HotSpot. This turned out to be an exciting adventure. Enjoy the ride!

The classes to test

We want to compare performance of hashCode() methods of several classes. These classes are described in the previous article; their code can be found in the repository. They are used to store two integer values (x and y). One class (Point) stores them individually as two separate variables; the others pack them into a long, high 32 bits being used for x and low ones for y. Here are the classes and their hashCode() methods:

public final int hashCode ()
{
    return x * 3 + y * 5;
}

public int hashCode() {
    return (int)(value ^ (value >>> 32));
}

public int hashCode ()
{
    return hi(v) * 3 + lo(v) * 5;
}

(with the obvious definitions of hi() and lo())

public int hashCode ()
{
    return hi(v) * 11 + lo(v) * 17;
}

public int hashCode ()
{
    return hi(v) * 1735499 + lo(v) * 7436369;
}

public int hashCode ()
{
    long x = v * 541725397157L;
    return lo(x) ^ hi(x);
}

public int hashCode ()
{
    return (int) (v % 946840871);
}

public int hashCode ()
{
    java.util.zip.CRC32 crc = new java.util.zip.CRC32 ();
    crc.update ((int)(v>>> 0) & 0xFF);
    crc.update ((int)(v >>> 8) & 0xFF);
    crc.update ((int)(v >>> 16) & 0xFF);
    crc.update ((int)(v >>> 24) & 0xFF);
    crc.update ((int)(v >>> 32) & 0xFF);
    crc.update ((int)(v >>> 40) & 0xFF);
    crc.update ((int)(v >>> 48) & 0xFF);
    crc.update ((int)(v >>> 56) & 0xFF);
    return (int) crc.getValue ();
}

Let’s add another one, NullPoint:

public int hashCode ()
{
    return 0;
}

The first test

It seems easy to compare the performance of several methods. We’ll just run each method some big number of times and measure the time (the code is in the repository):

public class HashTime
{
    static void iterate (int N, Object x)
    {
        for (int i = 0; i < N; i++)
            x.hashCode ();
    }
 
    static void test (Object x)
    {
        System.out.printf ("%20s: ", x.getClass ().getName ());
        int N = 100000000;
        
        for (int iter = 0; iter < 3; iter ++) {
            long t1 = System.currentTimeMillis ();
            iterate (N, x);
            long t2 = System.currentTimeMillis ();
            long t = t2-t1;
            System.out.printf (" %5d", t);
        }
        System.out.println ();
    }

    public static void main (String argv[])
    {
        int x = 531;
        int y = -295;

        test (new Point (x, y));
        test (LongUtil.fromPoint (x, y));
        test (new LongPoint (x, y));
        test (new LongPoint3 (x, y));
        test (new LongPoint4 (x, y));
        test (new LongPoint5 (x, y));
        test (new LongPoint6 (x, y));
        test (new LongPoint7 (x, y));
    }
}

The choice of the constants for x and y is completely arbitrary and has no special meaning. This is what we get when running this test on Java 7 (1.7.0.40):

           Point:    472   426   455
  java.lang.Long:    427   390   365
       LongPoint:    430   430   429
      LongPoint3:    431   428   428
      LongPoint4:    400   398   396
      LongPoint5:    423   395   395
      LongPoint6:    426   426   425
      LongPoint7:   9823  9814  9814
       NullPoint:    335   335   334

The results look a bit suspicious. Can Point really be slower than LongPoint5 and LongPoint6? This suspicion grows as we look at the results on Java 8 (I’m using 1.8.0.45):

           Point:      5     3     0
  java.lang.Long:     93   175   141
       LongPoint:    458   457   456
      LongPoint3:    437   426   425
      LongPoint4:    457   395   396
      LongPoint5:    457   423   395
      LongPoint6:   1644  1646  1645
      LongPoint7:   1791  1790  1791
       NullPoint:    377   365   365

Now Point is very fast, faster even than NullPoint. Could it be because it was the first in the list? Let’s try swapping it with Long:

  java.lang.Long:      6     2     0
           Point:    187   160   143

We see that the extraordinary speed is indeed caused by the first position in the list.

What happens if we repeat the test twice (introduce a loop into main())? We get the following for Java 7:

           Point:    463   426   456
  java.lang.Long:    427   392   365
       LongPoint:    429   426   426
      LongPoint3:    428   426   426
      LongPoint4:    398   396   397
      LongPoint5:    396   395   396
      LongPoint6:    427   426   425
      LongPoint7:   9824  9816  9816
       NullPoint:    336   334   334
           Point:    425   426   425
  java.lang.Long:    364   364   365
       LongPoint:    426   426   426
      LongPoint3:    426   426   426
      LongPoint4:    396   397   396
      LongPoint5:    395   395   396
      LongPoint6:    425   426   425
      LongPoint7:   9816  9816  9816
       NullPoint:    334   334   334

The only thing that has changed is Point. It is now more consistent with other results but still surprisingly slower than LongPoint4 and LongPoint5.

Java 8 shows more variety:

           Point:      6     3     0
  java.lang.Long:    188   158   143
       LongPoint:    458   426   426
      LongPoint3:    442   426   425
      LongPoint4:    457   445   395
      LongPoint5:    457   412   395
      LongPoint6:   1644  1645  1644
      LongPoint7:   1873  1870  1871
       NullPoint:    426   414   365
           Point:    395   395   396
  java.lang.Long:    395   395   395
       LongPoint:    429   429   429
      LongPoint3:    396   396   397
      LongPoint4:    396   395   396
      LongPoint5:    395   396   395
      LongPoint6:   1627  1627  1628
      LongPoint7:   1786  1786  1784
       NullPoint:    364   365   364

The first two methods are incredibly fast, but they become normal the second time we run them. To find out why this happens, we’ll have to look at the generated code.

The generated code: Java 7

Java virtual machine has a command line option that causes it to print the names of the methods being compiled: -XX:+PrintCompilation. I won’t show the outputs here, for they are long. When using Java 7, relatively few compilations happen. All the hashCode() methods are compiled when used. The iterate() method is compiled twice in the beginning and never recompiled afterwards. Java 8 behaves differently. It compiles iterate() ten times – five times while running with Point class, twice with Long, twice for LongPoint, and once for LongPoint4.

To look at the code, we must install a disassembler plugin (hsdis-amd64.so, which can be downloaded from Kenai Project). We want to see the code for HashTest.iterate(), so the command line must look like this:

# java -XX:+UnlockDiagnosticVMOptions -XX:+PrintCompilation  -XX:PrintAssemblyOptions=intel
-XX:CompileCommand=print,*HashTime.iterate HashTime

Note the -XX:PrintAssemblyOptions=intel part: I prefer traditional Intel notation of assembly listing to that of the GNU assembler. Luckily, the disassembler has such option.

Java 7 compiles this method twice, but the results only differ in entry and exit sequences and in the registers used, so we’ll only look at one of them – the second one.

Java disassembly outputs are long and difficult to read; that’s why I’ll skip most of them. This one, however, is interesting because it gives unforeseen insight into the default implementation of hashCode() method; still, I’m only showing the main loop here. I’ve also replaced jump targets with labels:

VIRTUAL_CALL:
0x00007f592cc39257: mov    QWORD PTR [rsp+0x8],r13
0x00007f592cc3925c: mov    DWORD PTR [rsp],ebx
0x00007f592cc3925f: mov    rsi,r13
0x00007f592cc39262: xchg   ax,ax
0x00007f592cc39265: mov    rax,0xffffffffffffffff  ;   {oop(NULL)}
0x00007f592cc3926f: call   0x00007f592cc0fd60  ; OopMap{[8]=Oop off=84}
                                              ;*invokevirtual hashCode
                                              ; - HashTime::iterate@8 (line 8)
                                              ;   {virtual_call}
0x00007f592cc39274: mov    ebx,DWORD PTR [rsp]
0x00007f592cc39277: mov    r13,QWORD PTR [rsp+0x8]
0x00007f592cc3927c: nop    DWORD PTR [rax+0x0]  ;*invokevirtual hashCode
                                              ; - HashTime::iterate@8 (line 8)
MAIN_LOOP:
0x00007f592cc39280: inc    ebp                ; OopMap{r13=Oop off=98}
                                              ;*goto
                                              ; - HashTime::iterate@15 (line 7)
0x00007f592cc39282: test   DWORD PTR [rip+0x8b13d78],eax        # 0x00007f593574d000
                                              ;*iload_3
                                              ; - HashTime::iterate@2 (line 7)
                                              ;   {poll}
0x00007f592cc39288: cmp    ebp,ebx
0x00007f592cc3928a: jge    END_OF_LOOP        ;*if_icmpge
                                              ; - HashTime::iterate@4 (line 7)
L1:
0x00007f592cc3928c: mov    r11d,DWORD PTR [r13+0x8]  ; implicit exception: dispatches to 0x00007f592cc392d6
0x00007f592cc39290: mov    r10,QWORD PTR [r12+r11*8+0x208]
0x00007f592cc39298: mov    r11,0x40c802248    ;   {oop({method} 'hashCode' '()I' in 'java/lang/Object')}
0x00007f592cc392a2: cmp    r10,r11
0x00007f592cc392a5: jne    VIRTUAL_CALL
L2:
0x00007f592cc392a7: mov    r10,QWORD PTR [r13+0x0]
0x00007f592cc392ab: mov    r11,r10
0x00007f592cc392ae: and    r11,0x7
0x00007f592cc392b2: cmp    r11,0x1
0x00007f592cc392b6: jne    VIRTUAL_CALL
0x00007f592cc392b8: shr    r10,0x8
0x00007f592cc392bc: mov    r10d,r10d
0x00007f592cc392bf: test   r10d,0x7fffffff
L3:
0x00007f592cc392c6: jne    MAIN_LOOP
0x00007f592cc392c8: jmp    VIRTUAL_CALL
END_OF_LOOP:

Some comments:

static void iterate (int N, Object x)
{
    for (int i = 0; i < N; i++) {
        long markWord;
        if (x.virtualTable.hashCode == Object.hashCode
            && ((markWord = x.markWord) & 7) == 1
            && ((markWord >> 8) & 0x7FFFFFFF) != 0
           )
            ; // do nothing
        else
            vm.callVirtual (x, hashCode);
    }
}

These checks are not part of a general-purpose method dispatch; for instance, they are not present in the examples in Shipilev’s article. They are part of Object.hashCode() inlining. This method is declared as native, but obviously the HotSpot has a special treatment for it (methods that enjoy such a treatment are called intrinsics). Here is what is actually happening here. The last three bits (x.markWord & 7) contain the lock status of the object, the value 1 meaning “not locked”. The reason these bits are checked is that they define what’s stored in the higher bits of the mark word. When the object is locked, these bits keep the reference to the thread that holds the lock; otherwise, they keep the hash code of the object. This hash code is based on the object’s address; however, the objects can move in memory when the heap is compacted and must retain their hash codes. That’s why the hash code field is initialised to zero and filled in the first time hashCode() is called. This requires thread synchronisation, and is in general non-trivial – that’s why it is considered too complex to be inlined and is implemented as a runtime call. In our case this is combined with the speculative optimisation – the case when the object is locked or the hashcode isn’t initialised is treated the same way as the case when the method isn’t Object.hashCode().

This pseudo-code suggests an optimisation: the if statement can be moved out of the loop. As easy as it looks, it is not completely trivial to implement: the call to vm.callVirtual won’t change x.virtualTable.hashCode, but it can change the markWord. Besides, the compilers are usually good at moving calculations out of the loop, but not conditions. The typical way to move a condition out of the loop involves duplication of the loop body with subsequent specialisation, and not every compiler writer likes this idea.

The generated code: Java 8

Why are the Java 8 results so different? Let’s look at the code.

The code samples produced by numerous compiler runs during the execution of the test with Point class differ a lot. These samples make very interesting reading, but to save space here we’ll only look at the last one of those. So, after five attempts to compile the HashTime.iterate() method, the Hotspot compiler generated the following code (this is the entire code of the method):

0x00007fbff92b7e00: mov    DWORD PTR [rsp-0x14000],eax
0x00007fbff92b7e07: push   rbp
0x00007fbff92b7e08: sub    rsp,0x20           ;*synchronization entry
                                              ; - HashTime::iterate@-1 (line 7)

0x00007fbff92b7e0c: test   edx,edx
0x00007fbff92b7e0e: jle    RETURN             ;*if_icmpge
                                              ; - HashTime::iterate@4 (line 7)

0x00007fbff92b7e10: mov    r10d,DWORD PTR [rcx+0x8]  ; implicit exception: dispatches to 0x00007fbff92b7e29
0x00007fbff92b7e14: cmp    r10d,0xf800c105    ;   {metadata('Point')}
0x00007fbff92b7e1b: jne    BAIL_OUT           ;*return
                                              ; - HashTime::iterate@18 (line 9)

RETURN:
0x00007fbff92b7e1d: add    rsp,0x20
0x00007fbff92b7e21: pop    rbp
0x00007fbff92b7e22: test   DWORD PTR [rip+0x16c151d8],eax        # 0x00007fc00fecd000
                                              ;   {poll_return}
0x00007fbff92b7e28: ret    
BAIL_OUT:
0x00007fbff92b7e29: mov    esi,0xffffff86
0x00007fbff92b7e2e: mov    ebp,edx
0x00007fbff92b7e30: mov    QWORD PTR [rsp],rcx
0x00007fbff92b7e34: xchg   ax,ax
0x00007fbff92b7e37: call   0x00007fbff90051a0  ; OopMap{[0]=Oop off=92}
                                              ;*aload_2
                                              ; - HashTime::iterate@7 (line 8)
                                              ;   {runtime_call}
0x00007fbff92b7e3c: call   0x00007fc00f5d6320  ;*aload_2
                                              ; - HashTime::iterate@7 (line 8)
                                              ;   {runtime_call}

This is much smaller than the code in Java 7. The checks of the mark word bits are gone. And, most importantly, the loop is gone (the only thing that’s left is the upper boundary check). I believe that the pseudo-code now looks like this:

static void iterate (int N, Object x)
{
    if (N > 0) {
        if (x.getClass() == Point.class)
            ; // do nothing
        else
            vm.returnToInterpretingMode();
    }
}

Some comments:

The code produced while executing the test with java.lang.Long is different. It is much longer, so I won’t quote it here (I’ve placed it in a repository). It is also very complex – it’s enough to mention that it contains 49 branch targets, with a completely psychedelic structure of jumps between them. What makes it so big is that it checks for two possible classes of x (Point and Long) and unrolls the loop 16 times instead of removing it. There is no way to write it down exactly as a pseudo-code, the closest I can come with is something like this:

static void iterate (int N, Object x)
{
    int i;
    for (i = 0; i < N-16; i+=16) {
        if (x.getClass() != Point.class && x.getClass() != Long.getClass()) {          vm.returnToInterpretingMode (); }
        if (x.getClass() != Point.class && x.getClass() != Long.getClass()) { i +=  1; vm.returnToInterpretingMode (); }
        if (x.getClass() != Point.class && x.getClass() != Long.getClass()) { i +=  2; vm.returnToInterpretingMode (); }
        if (x.getClass() != Point.class && x.getClass() != Long.getClass()) { i +=  3; vm.returnToInterpretingMode (); }
        if (x.getClass() != Point.class && x.getClass() != Long.getClass()) { i +=  4; vm.returnToInterpretingMode (); }
        if (x.getClass() != Point.class && x.getClass() != Long.getClass()) { i +=  5; vm.returnToInterpretingMode (); }
        if (x.getClass() != Point.class && x.getClass() != Long.getClass()) { i +=  6; vm.returnToInterpretingMode (); }
        if (x.getClass() != Point.class && x.getClass() != Long.getClass()) { i +=  7; vm.returnToInterpretingMode (); }
        if (x.getClass() != Point.class && x.getClass() != Long.getClass()) { i +=  8; vm.returnToInterpretingMode (); }
        if (x.getClass() != Point.class && x.getClass() != Long.getClass()) { i +=  9; vm.returnToInterpretingMode (); }
        if (x.getClass() != Point.class && x.getClass() != Long.getClass()) { i += 10; vm.returnToInterpretingMode (); }
        if (x.getClass() != Point.class && x.getClass() != Long.getClass()) { i += 11; vm.returnToInterpretingMode (); }
        if (x.getClass() != Point.class && x.getClass() != Long.getClass()) { i += 12; vm.returnToInterpretingMode (); }
        if (x.getClass() != Point.class && x.getClass() != Long.getClass()) { i += 13; vm.returnToInterpretingMode (); }
        if (x.getClass() != Point.class && x.getClass() != Long.getClass()) { i += 14; vm.returnToInterpretingMode (); }
        if (x.getClass() != Point.class && x.getClass() != Long.getClass()) { i += 15; vm.returnToInterpretingMode (); }
    }
    for (; i < N; i++) {
        if (x.getClass() != Point.class && x.getClass() != Long.getClass()) vm.returnToInterpretingMode ();
    }
}

This is why this code is slower than the one specialised for Point but still faster than the general-purpose code with a virtual call (140 ms instead of 400).

The code generated by subsequent runs of the compiler does not differ much from the code from Java 7, that’s why the results are similar.

The second test: using the result

One of the reasons why in some cases the entire method was optimised out is that the results of hashCode() call were not used. We can modify the code to use the results – for instance, to add them all together:

int sum (int N, Object x)
{
    int s = 0;
    for (int i = 0; i < N; i++)
        s += x.hashCode ();
    return s;
}

Obviously, the test () must be modified accordingly:

void test (Object x)
{
    System.out.printf ("%20s: ", x.getClass ().getName ());
    int N = 100000000;
    int s = 0;
        
    for (int iter = 0; iter < 3; iter ++) {
        long t1 = System.currentTimeMillis ();
        s += sum (N, x);
        long t2 = System.currentTimeMillis ();
        long t = t2-t1;
        System.out.printf (" %5d", t);
    }
    System.out.println ("; sum=" + s);
}

Note that we make sure we use the results of calls to sum() by adding them together and printing the result. This is a safety precaution against possible inlining of sum() or compiling it with the knowledge that its results aren’t ever used – some compilers are capable of this.

The new code is here.

This is what we get in Java 7:

           Point:    491   457   489; sum=1040261632
  java.lang.Long:    459   408   402; sum=-1786864128
       LongPoint:    491   489   488; sum=1040261632
      LongPoint3:    461   458   459; sum=-1308103168
      LongPoint4:    460   459   460; sum=1455681024
      LongPoint5:    432   430   431; sum=26862080
      LongPoint6:    468   465   465; sum=-2103278592
      LongPoint7:   9911  9905  9911; sum=1732665600
       NullPoint:    374   367   367; sum=0

And this is in Java 8:

           Point:     12     7     3; sum=1040261632
  java.lang.Long:    213   161   153; sum=-1786864128
       LongPoint:    462   458   457; sum=1040261632
      LongPoint3:    518   510   457; sum=-1308103168
      LongPoint4:    488   429   458; sum=1455681024
      LongPoint5:    441   428   427; sum=26862080
      LongPoint6:   1669  1650  1649; sum=-2103278592
      LongPoint7:   1892  1892  1893; sum=1732665600
       NullPoint:    405   374   374; sum=0

The results do not differ much from the previous ones. The unusually slow Point in Java 7 and unusually fast Point and Long in Java 8 are still there. The only thing that has changed is that LongPoint4 became slower.

The Java 7 code of sum isn’t much different from that of iterate. The identity hash code extracted from the mark word is now added to s (I’ll only show the pseudo-code):

static int sum (int N, Object x)
{
    int s = 0;
    int hash;
    long markword;
    for (int i = 0; i < N; i++) {
        if (x.virtualTable.hashCode == Object.hashCode
            && ((markWord = x.markWord) & 7) == 1
            && (hash = (markWord >> 8) & 0x7FFFFFFF) != 0
           )
            s += hash;
        else
            s += vm.callVirtual (x, x.hashCode);
    }
    return s;
}

The last Java 8 code produced while running with Point class is quite long and esoteric (it can be seen here). I’ll show my reconstruction of its pseudo-code:

static int sum (int N, Object x)
{
    int s = 0;
    int i = 0;
    int hash;
    
    if (N > i) {
        for (int limit = i + 1;  i < limit; i ++) {
            if (x.getClass() != Point.class) {
                vm.returnToInterpretingMode();
            }
            Point p = (Point) x;
            hash = p.y * 5 + p.x * 3;
            s += hash;
        }
    
        int N1 = N - 15;
        if (N1 > N) N1 = Integer.MIN_VALUE;
        
        if (i < N1) {
            int hash16 = hash * 16;
            do {
               s += hash16;
               i += 16;
            } while (i < N1);
        }
        if (i < N) {
            do {
                s += hash;
                ++ i;
            } while (i < N);
        }
    }
    return s;
}

We see that the optimisation mentioned earlier (moving an invariant condition out of a loop) has in fact happened. The loop has been specialised for the case where x is of type Point. All other cases cause bailout to interpreting mode in the loop preheader (a piece of code that is executed exactly once if the loop is executed at least once). The loop has been unrolled 16 times, It is interesting that the compiler figured out that the loop is adding an invariant value hash to s and replaced 16 such additions with one addition of hash * 16, but didn’t manage to replace the entire loop with multiplication by hash * N. The initialisation step (the first loop) also looks funny. It didn’t have to be compiled as a loop at all! However, as we saw, the method executes fast, so we can ignore these abnormalities.

Finally, we’ll look at the code generated by Java 8 when the test is run with Long class – as previously, the compiler allows for the Point class as well. The code is long, so I’ll only show my reconstruction (the full code is here). The inline_hash method is my own creation; in the HotSpot-generated code this method is inlined each time it is used – yes, the incoming value x is tested for being of tyoe Point or type Long six times, each time with the inlined code of the appropriate hashCode() method.

static int inline_hash (Object x)
{
    if (x.getClass() == Point.class) {
       return ((Point) x).y * 5 + ((Point) x).x * 3;
    }
    if (x.getClass() == Long.class) {
       long v = ((Long) x).value;
       return  (int) (v ^ (v >>> 32));
    }
    vm.returnToInterpretingMode();
}    
    
static int sum (int N, Object x)
{
    int s = 0;
    
    if (N > 0) {
        s = inline_hash (x);

        int N1 = N - 3;
        if (N1 > N) N1 = Integer.MIN_VALUE;
        
        int i = 1;
        if (i < N1) {
            do {
               s += inline_hash (x);
               s += inline_hash (x);
               s += inline_hash (x);
               s += inline_hash (x);
               i += 4;
            } while (i < N1);
        }
        if (i < N) {
            do {
                s += inline_hash (x);
                ++ i;
            } while (i < N);
        }
    }
    return s;
}

While this code looks horrible, it doesn’t perform too badly. Even a code like this is better than an indirect call or a switch to interpreting mode.

After this point, Java 8 produces the code similar to that of Java 7 – speculative inlining of Object.hashCode().

The workaround

As we see, the primary reason that the performance varies so much is that HotSpot collects type statistics and employs speculative specialisation, which is performed especially aggressively in Java 8. The virtual methods are resolved, their bodies inlined and moved out of loops – all these optimisations are great in general, but harmful for our benchmarking. There is no point in comparing performance of a method in classes C1 and C2 when the C1 version is inlined and optimised out while C2 version is not.

One possible solution is not to combine different tests in the same process run. If each VM run tests exactly one class, all the classes will be tested in identical environment. Unfortunately, this approach is only applicable to full programs (such as our Life simulation), but not to simplified tests. If the method is inlined and moved out of the loop in sum(), the nature of that method becomes irrelevant. All of them will produce the same measurable result (zero time).

A better approach is to wait until the code stabilises at the version that does not depend on the specific class and treats all the classes equally badly – performs virtual call for all of them. As we’ve seen, this happens in both versions of Java VM, very quickly in Java 7, more slowly in Java 8. After this stabilisation the measurements will produce results that we can compare to each other directly.

One way to cause this stabilisation is to repeat the entire test several times. Another way is to “pre-heat” the compiler by running the test on the mixture of classes. In both cases we hope that the statistics engine saturates, causing the compiler to avoid making any assumptions about the classes. I tried the second approach using the following code in main() of the new class HashTime1:

static int sumArray (ArrayList<Object> a)
{
    int sum = 0;
    for (int i = 0; i < a.size (); i++) {
        sum += sum (1, a.get(i));
    }
    return sum;
}
 
public static void main (String argv[])
{
    int x = 531;
    int y = -295;
    ArrayList<Object> a = new ArrayList<Object> ();
    int sum = 0;

    for (int i = 0; i < 100000; i++) {
        a.add (new Point (x, y));
        a.add (LongUtil.fromPoint (x, y));
        a.add (new LongPoint (x, y));
        a.add (new LongPoint3 (x, y));
        a.add (new LongPoint4 (x, y));
        a.add (new LongPoint5 (x, y));
        a.add (new LongPoint6 (x, y));
        a.add (new LongPoint7 (x, y));
        a.add (new NullPoint (x, y));
    }

    for (int j = 0; j < 100; j++) {
        sum += sumArray (a);
    }
    System.out.println (" Sum = " + sum);

    for (int i = 0; i < 2; i++) {
        test (new Point (x, y));
        test (LongUtil.fromPoint (x, y));
        test (new LongPoint (x, y));
        test (new LongPoint3 (x, y));
        test (new LongPoint4 (x, y));
        test (new LongPoint5 (x, y));
        test (new LongPoint6 (x, y));
        test (new LongPoint7 (x, y));
        test (new NullPoint (x, y));
    }
}

This is the result for Java 7:

            Point:    548   547   547; sum = 1040261632
   java.lang.Long:    547   547   547; sum = -1786864128
        LongPoint:    548   547   547; sum = 1040261632
       LongPoint3:    548   547   547; sum = -1308103168
       LongPoint4:    547   547   547; sum = 1455681024
       LongPoint5:    547   547   548; sum = 26862080
       LongPoint6:    577   577   578; sum = -2103278592
       LongPoint7:  10262 10263 10266; sum = 1732665600
        NullPoint:    652   517   518; sum = 0
            Point:    547   547   548; sum = 1040261632
   java.lang.Long:    551   575   564; sum = -1786864128
        LongPoint:    564   564   550; sum = 1040261632
       LongPoint3:    548   549   575; sum = -1308103168
       LongPoint4:    547   548   548; sum = 1455681024
       LongPoint5:    548   547   548; sum = 26862080
       LongPoint6:    591   578   578; sum = -2103278592
       LongPoint7:  10403 10443 10473; sum = 1732665600
        NullPoint:    517   518   517; sum = 0

And this is for Java 8:

            Point:    489   488   488; sum = 1040261632
   java.lang.Long:    488   488   487; sum = -1786864128
        LongPoint:    488   488   488; sum = 1040261632
       LongPoint3:    488   488   488; sum = -1308103168
       LongPoint4:    488   487   488; sum = 1455681024
       LongPoint5:    488   488   488; sum = 26862080
       LongPoint6:   1660  1661  1661; sum = -2103278592
       LongPoint7:   1888  1889  1888; sum = 1732665600
        NullPoint:    519   487   458; sum = 0
            Point:    488   487   487; sum = 1040261632
   java.lang.Long:    488   488   488; sum = -1786864128
        LongPoint:    488   488   488; sum = 1040261632
       LongPoint3:    488   488   488; sum = -1308103168
       LongPoint4:    488   489   488; sum = 1455681024
       LongPoint5:    488   488   488; sum = 26862080
       LongPoint6:   1664  1663  1663; sum = -2103278592
       LongPoint7:   1891  1891  1892; sum = 1732665600
        NullPoint:    457   458   458; sum = 0

Observations:

Conclusions

Coming soon

In the next articles, we’ll try using this measurement technique to optimise our hash functions to the limit, starting with the division-based one, and then check how this optimisation affects the overall performance of the original program.

comments powered by Disqus