dinsdag 9 december 2008

Optimizing your Java code

I recently attended the Devoxx Java conference, held at Antwerp (Belgium) from Monday, 8th of Decembre 'till Friday,12th of Decembre. Unfortunately, I could only go the first two days but I managed to attend some interesting lectures about Java performance. There seem to be a lot of techniques you can apply to your Java code to make it run faster.
In the following days/weeks, I will discuss quite a few of these.



A lot of the people I talk to daily often complain about the speed, or lack thereof, of the Java programming language. Java is not one of the fastest programming languages out there and to understand why, you should dig into the basic structure of Java. Below is included a short explanation of how Java works. If you require more information, feel free to examine the links given at the end of this article. If you already know how Java works, feel free to skip the first part.


The basic structure of Java


Java works on every operating system, provided there is a Java Virtual Machine (JVM) available for this operating system. This is usually the case. If it's not the case, you should probably be wondering why you're working on such a system anyway.


Every instruction you program in the Java programming language gets translated into some intermediary language that only the JVM understands. This JVM (specific to the operating system it is installed on) then translates this to something the operating system can understand. The OS then ensures that these instructions are being executed by the hardware.


So every line of code you write in the Java programming language is translated not once, but thrice. Once, so the JVM understands what you want to make happen. These instructions are then translated again by the JVM so the operating system knows what you want done. The operating system then translates this into machine-code that the hardware can understand. The result (if any) is then calculated by the hardware, sent to the operating system, who gives it to the JVM who sents it to you.


That's a long way for data to travel and this is the first reason why people are quick to say that Java is slow. Sure, Java has its drawbacks when it comes to speedy execution of code and probably shouldn't be used for Grid Computing, for instance. But Java also has advantages. Like I said before, Java is platform-independent. Program once, and run it anywhere. You wrote your program on Windows but it will also work on Linux or Mac OS X without you having to invest extra time in porting your application. Also, Java has automatic memory management, thus removing those objects you no longer need to free up memory while your program is running (see Java Garbage collection) and on top of all of this, Java is also a very secure programming language.


When you write your Java code, there are certain aspects you should be aware of. Some of them will affect the performance of your Java code dramatically (either positively or negatively).


The first problem – String concatenation


One of the lectures I attended concerned itself entirely with testing for performance. The lecture was given by Holly Cummins (noted IBM garbage collection tuning specialist) and Kirk Pepperdine (a primary contributer to javaperformancetuning.com). The very first problem they tackled was that of String concatenation. They demonstrated a little program that was supposed to output the information of various stocks in HTML-form. This was accomplished by a method quite like the one below:



Table 1.1 The method getStockInfo



public String getStockInfo() {
String result;
result += "<html>";
result += "<head>";
result += "</head>";
result += "<body>";
result += "<table>";

for (Stock s : allStocks) {
result += "<tr>";
result += "<td>Name: </td>";
result += "<td>" + s.getStockName() + "</td>";
result += "<td>" + s.getStockDate() + "</td>";
result += "<td>" + s.getStockRating() + "</td>";
result += "</tr>";
}
result += "</table></body></html>";
return result;
}


So as you can see, we make a String-object, append it using the '+=' operator with tags like <html> and then we get into a ForEach loop, iterating over the allStocks-collection. For every Stock, we append the result object with all data contained in the Stock, surrounded with HTML tags.


What do you think, will this code run smoothly? There is an easy way to find out.


In the Extra-Material-part at the end, you'll find a link to a generic StopWatch class. It is a light monitoring tool and will only have a very small effect on the performance of the application. This class measures performance from one point to another and outputs its result as a Long value.
Note that it's not important to actually understand this Class. I didn't write it myself, either, but found it on the Web. It's a great lightweight tool for performance monitoring though. Just create a new class and paste the code in there. It should work out of the box.

To test this code in an efficient way, I just use the following Main-class:


Table 1.2 The Main class



    public Main() {
allStocks = new ArrayList();
for (int i = 0; i<10000; i++) {
allStocks.add(new Stock());
}

StopWatch stopwatch = new StopWatch();
stopwatch.start();
getStockInfo();
stopwatch.stop();
long result = stopwatch.toValue();

System.out.println(result);
}


As you can see, I first make 10 000 Stock-objects. That's not an exaggerated number. If you would put this program into production, it might run into instances of having 10 000 Stock objects to keep track of.


Try running this code. On my computer, it took about 17 minutes to complete. (1.6 Ghz single-core cpu, 1 Gb of RAM). As you can probably understand, when I want to see the current data of all Stocks, I don't want to wait 17 minutes. The execution time of this code is completely unacceptable.


There must be something wrong with it. What might we have overlooked?
Well, it's obvious we didn't read the API closely enough because the API for the class String explicitly states:
Strings are constant; their values cannot be changed after they are created. You read correctly. Once you create a String object, you can never change it again. java.lang.String is defined as an Immutable class.

So what gives? I'm changing my String aren't I? No, you're not. You're creating new String objects, containing the previous String and the next part you're trying to concatenate. That's a bummer for performance because every time you create a new String-object the old one becomes obsolete.

Because of the automatic Java memory management (Garbage collection) this old object is destroyed every time. So the Garbage Collector stops your program, removes the old String object from memory and then allows your program to continue. Your program concatenates again. Garbage collector interrupts and removes the old String, and so on and so forth.



Is there a good way to fix this?

Yes there is, fortunately. The Java API provides us with a StringBuilder object. This object works quite like a String-object but for one subtle difference. Internally, the StringBuilder (and StringBuffer as well) holds an array of Characters (char). When you want to concatenate to this String, it just resizes the array and puts the new characters in one by one. At the end, when you really need the String, you call the method toString() of the StringBuilder and for the first time, a String is constructed.
So remember to do all of your adaptations (concatenations, replacement of letters, etc...) on a StringBuilder object and then asking for a String. This is much faster.

But don't take my word for it. By all means. Adapt the getStockInfo() method like so:



Table 1.3 The getStockInfo method revised



    public String getStockInfo() {
StringBuilder sbResult = new StringBuilder();
sbResult.append("<html>");
sbResult.append("<head>");
sbResult.append("</head>");
sbResult.append("<body>");
sbResult.append("<table>");

for (Stock s : allStocks) {
sbResult.append("<tr>");
sbResult.append("<td>Name</td>");
sbResult.append("<td>");
sbResult.append(s.getStockName());
sbResult.append("</td>");
sbResult.append("<td>");
sbResult.append(s.getStockDate());
sbResult.append("</td>");
sbResult.append("<td>");
sbResult.append(s.getStockRating());
sbResult.append("</td>");
sbResult.append("</tr>");
}
sbResult.append("</table></body></html>");

return sbResult.toString();
}

I know, it looks pretty counter-intuitive. By using a StringBuilder you get more lines of code so you automatically think that it must be slower than using String concatenation but it's not. It's actually way faster.
The data on my computer is as follows:

  • When running with normal String concatenation, the getStockInfo() method took 17 minutes 37 seconds to complete.

  • When running with StringBuilder concatenation, the getStockInfo() method took only 148 milliseconds to complete.

Do you see the huge difference? I just sped up my program by more than 500%! And all I did was change String concatenation to StringBuilder concatenation! How efficient is that!!



Conclusion


So what have we learned exactly?
Not only have we learned to never use String concatenation, but we have also learned that the Java Garbage Collector can have a huge impact on the performance of your application. It frequently interrupts your program to get rid of the variables you no longer need. It's useful to keep this in mind when you're writing your code.

So that was the first of my Java Performance posts.
I hope you enjoyed, please drop me a line if you've read this. Anything at all is always appreciated.


Extra Material


The StopWatch class: http://www.javapractices.com/topic/TopicAction.do?Id=85
Podcasts by Holly Cummins: http://www.theserverside.com/tt/knowledgecenter/
The website of Kirk Pepperdine: http://www.kodewerk.com/

Geen opmerkingen: