The Java (not really) Faster than C++ Benchmark |
Last update: Mon Apr 11 22:32:56 CEST 2011
When I first read "Java faster than C++ benchmark", I was sure that there was something wrong with it. After all, Java couldn't be faster that C++, right? What would be next? C++ faster than C? C faster than Assembler?
After quite a while I've found some time to update the results using brand new JVMs and GCC versions.
All new tests were performed on Linux 2.6, Intel® Core™ Quad Q6600 2.40GHz. Older tests (marked as such) were done on AMD Athlon™ XP 2500+ (Barton). No other load was placed on the machine during tests, run level 1 (single user mode) had been used. GNU C Library had optimizations for i686 (standard Debian package, this should be of benefit for both Java and C++).
When a tested GCC version wasn't available with my Linux distro, I compiled GCC myself from the standard release. Please note that the result of this benchmark (I am talking about "C++ faster than Java" result) is valid for GCC 3.4.4, 4.0.x, 4.1.x, 4.2.x and 4.3.x as well, but GCC series 4.5 is faster than earlier GCCs. If truth be told, each new major GCC release performs better than the previous one - as should be expected, given the amount of work GCC guys put into their project.
I've kept the original number of repetitions in all tests.
Time was measured in the same way as in the original results, with one exception: the time used as benchmark result was elapsed real (wall clock) time used by the process. If we stick to the original method of measurement, the results will be incorrect and biased towards Java - because of threads used by the JVM. Wall clock time is better and more accurate if the machine operates under no load - as was the case here.
Since Java compiler does not have any settings that could affect performance, Java code was compiled with standard settings, using Java compiler from Sun in the same version as the JVM that was used later to do the actual benchmark.
To run hash, heapsort and strcat tests I had to increase the heap size. It's pretty much
standard practice for enterprise software, so I do not see it as a drawback of Java.
When looking at C++ code, I've noticed many performance problems, which probably may go unnoticed for people with strong Java background. These problems were not present in Java code.
In other words, to make this benchmark fair, I had to make some modifications to the original code. In doing so, I tried to make the code as close to the original as possible, even if my personal coding style is completely different. If anyone feels that some further modifications could/should be made to Java or C++ code, do not hesitate to contact me.
For C++ compilation, I've used the following options:
-O2 -fomit-frame-pointer -finline-functions -std=c++0x -march=core2Note: older results were produced with -march=athlon-xp architecture.
I consider the first two options standard for compilation (all major Linux distributions, Linux kernel and many others use these two for compilation).
-finline-functions
is used to tell the compiler to guess which functions should be inlined (Java also does that).
-std=c++0x
is used to tell the compiler to use the 0x C++ standard (it's like a JDK level in Java).
Core2 - well, that's my machine. I could use pentium or i686 instead (which could be considered more standard), but it would not change the overall result - that, is C++ being the clear winner. So, let's get down to business, shall we?
No changes to the code here. The clue to improving performance of C++ lies in
Java results. Client VM does not finish the test - stack overflow ocurs. Server VM
passes the test and the result is much better than C++. The easiest answer is
that Server VM does not recurse as much as Client VM, because of inlining.
In GCC you can enable automated inlining by adding -finline-functions option.
Since the recursion is very deep here, this program will benefit a lot. This option
was used in all tests, not only here.
Note that it is possible to make this program even faster (up to twice as fast) if you enable more aggressive inlining in GCC. I decided against doing it, but in real life, if you know that your program does a lot of recursion it may be worth a try.
No changes here.
Both of these have the same problem. If you use map in C++, you use a TreeMap equivalent, not HashMap equivalent. TreeMaps are much worse (of course) when it comes to performing (huge amounts) of insertions on them. In C++, if you want hash maps, you have to use hash_map class (obsolete) or unordered_map (soon to be part of C++ standard, already available in GCC 4.x). I've rewritten both of these to use unordered_map just to stay on the bleeding edge ;-) . Performance results are similar with both hash_map and unordered_map. If you look closely at C++ code you will notice that a benchmark called "Java prettier that C++" would make an interesting read.
No changes here.
No changes here.
OK. So what this test actually does is it calls two methods 1 000 000 000 times each. Java uses references to access objects, so I've changed C++ code to use references too.
I also inverted the loop direction, since comparison against 0 is faster - normally I would leave it as it was, but since this was the only benchmark in which Java Server came close to C++, I had to do something ;-) . Oh, and inverting the loop direction has no effect whatsoever on Java, so I left the code as it was.
If you look at the results, you will notice that only the poor Java Client VM actually tried to invoke the methods - both Java Server and GCC inlined the methods, which resulted in a much better performance.
No changes here.
Before I explain the changes, a word of explanation about differences in memory management between Java and C++. (What follows is all very simplified). Java allocates new memory blocks on it's internal heap (which is allocated in huge chunks from the OS). In this way, in most of the cases it bypasses memory allocation mechanisms of the underlying OS and is very fast. If you allocate memory on heap in C++, each allocation request will be sent to the operating system, which is slow. That's why Java won in the original benchmark.
The thing is, in C++ you don't have to use the default allocation algorithm - you have other options. You can either change the allocation behaviour in C++ to allocate memory exactly in the way Java does (which will be very fast), or allocate memory on the stack, not heap (which is more or less similar to the way memory management in Java works, if you grossly oversimplify the problem).
If you look at the code which allocates code on stack, you will notice that it is much simpler than the original version. And if you look at the results you will notice that it's much faster, too. Actually, so fast, that it didn't even register on the graph ;-)
No changes here.
Rewritten. I think that original code was very unfair for C++:
I've replaced vector with an array (like in Java), removed the iterators (obviously) and removed the final count() call. The code is now more or less equivalent to Java version.
I simplified the code, because it basically did what string implementation already does internally. I also removed the reserve() call from code - although keeping it makes C++ faster, I think it's cheating. If you actually reserve the string size (the way original code does it), you will get 20% performance increase. You cannot do this in Java. See, I am fair.
I added ios_base::sync_with_stdio() call to make sure that C++ does not keep the stream in sync with C functions all the time - no reason to do it. The code can be made faster by 50% if C routines (fgets) are used to access stdin instead of C++ streams. I decided not to do it to keep the code as close as possible to the original version.
Only sync_with_stdio was added. It does improve the results a bit, but C++ was already fast anyway.
On Intel, all tests but two ended with C++ being better. Overall, C++ was 1.7x faster than Java.
Results for recent tests performed on Intel Quad Core:
Old tests performed on Athlon:
Old tests on Intel:
The most important conclusion is obvious. (For this set of benchmarks,) C++ is clearly the winner.
Second conclusion - don't use Client VM in older Java versions.
Unfortunately, there's also a third conclusion. It seems that it's much, much easier to create a well performing program in Java. So, please consider it for a moment before you start recoding your Java program in C++ just to make it faster... Also, let's face it: for most applications, Java being less than twice slower than C++ means nothing, and the development time is significantly shorter with Java.
It seems that on new hardware the gap between Java and C++ has narrowed. Going from 3x slower to 1.7x slower is quite an impressive feat on Java side. And keep in mind that C++ also was getting faster with every compiler release.
Please note that I had to increase the stack size in Ackermann test for Java 6 - otherwise neither of the JVM versions would finish the test. This further supports the theory about changes in the inlining heuristics.
Przemyslaw Bruski, SCJD - mail me at [pbruskispam "at" op "dot" pl]