Showing posts with label memory leak. Show all posts
Showing posts with label memory leak. Show all posts

Friday, 24 June 2011

String and memory leaks

Probably most of the Java users is aware that String object is more complex than just an array of char. To make the usage of strings in Java more robust additional measures were taken – for instance the String pool was created to save memory by reusing the same String objects instead of allocating new ones. Another optimization that I want to talk about today is adding the offset and count fields to the String object instances.
Why those fields were added? Their purpose is to help to reuse already allocated structures when using some of the string functionalities – like calculating substrings of a given string. The concept is that instead of creating an new char array for a substring we could just ‘reuse’ the old one. To be exact this is what String.substring() method does: instead of copying an char array for the returned object it creates a new String reusing char[] of the old one. Only the values of offset and count fields (which indicate the beginning and the length of a new string) are changed. See here first to see - how substring function works? Because the substring operation is quite often used this mechanism helps to save a lot of memory. It is important to add that this can work only because String objects are immutable. See the following snippet:

public static void sendEmail(String emailUrl) {
String email = emailUrl.substring(7); // 'mailto:' prefix has 7 letters
String userName = email.substring(0, email.indexOf("@"));
String domainName = email.substring(email.indexOf("@"));
}

public static void main(String[] args) {
sendEmail("mailto:user_name@domain_name.com");
}

Thanks to the way substring() is implemented when we extract the email, userName and domainName three new String objects are created, but the char array is not copied. All new string variables reuse the character array from emailUrl. Thanks to that reuse for really long urls we can save approximately 2/3 of memory we would use otherwise. Great, right?

Ok… now the dark side of that optimization! Check out this snippet:

public final static String START_TAG = "<title>";
public final static String END_TAG = "</title>";

public static String getPageTitle(String pageUrl) {
// retrieve the HTML with a helper function:
String htmlPage = getPageContent(pageUrl);

// parse the page content to get the title
int start = htmlPage.indexOf(START_TAG);
start = start + START_TAG.length();
int end = htmlPage.indexOf(END_TAG);
String title = htmlPage.substring(start, end);
return title;
}

In here we are extracting from the HTML page its title – can you see the problem with this code? Looks simple and correct, right?

Now, try to imagine that the htmlPage String is huge – more than 100.000 characters, but the title of this page has only 50 characters. Because of the optimization mentioned above the returned object will reuse the char array of the htmlPage instead of creating a new one… and this means that instead of returning a small string object you get back a huge String with 100.000 characters array!! If your code will invoke getPageTitle() method many times you may find out that you have stored only a thousand titles and already you are out of memory!! Scary, right?

Of course there is an easy solution for that – instead of returning the title in line 13, you can return new String(title). The String(String) constructor is always doing a subcopy of the underlying char array, so the created title will actually have only 50 characters. Now we are safe:)

So what is the lesson here? Always use new String(String)? No… In general the String optimizations are really helpful and it is worth to take advantage of them. You just have to be careful with what you are doing and be aware of what is going on ‘under the hood’ of your code. String class API is in some situations not intuitive, so beware!

Saturday, 11 June 2011

Memory Leak : static collection object may be the reason

A very simple example of a memory leak would be a java.util.Collection object (for example, a HashMap) that is acting as a cache but which is growing without any bounds.
public class MyClass {
static HashSet myContainer = new HashSet();
public void leak(int numObjects) {
for (int i = 0; i < numObjects; ++i) {
String leakingUnit = new String("this is leaking object: " + i);
myContainer.add(leakingUnit);
}
}

Now in the main method:
public static void main(String[] args) throws Exception {
{
MyClass myObj = new MyClass();
myObj.leak(100000); // One hundred thousand }
System.gc();
}


In the above program, there is a class with the name MyClass which has a static reference to HashSet by the name of myContainer. In the main method of the class: MyClass, (in bold text) within which an instance of the class: MyClass is instantiated and its member operation: leak is invoked. This results in the addition of a hundred thousand String objects into the container: myContainer. After the program control exits the subscope, the instance of the MyClass object is garbage collected, because there are no references to that instance of the MyClass object outside that subscope. However, the MyClass class object has a static reference to the member variable called myContainer. Due to this static reference, the myContainer HashSet continues to persist in the Java heap even after the sole instance of the MyClass object has been garbage collected and, along with the HashSet, all the String objects inside the HashSet continue to persist, holding up a significant portion of the Java heap until the program exits the main method.

This program demonstrates a basic memory leaking operation involving an unbounded growth in a cache object. Most caches are implemented using the Singleton pattern involving a static reference to a top level Cache class as shown in this example.


Here is the GC information for you for the above snippet....

[GC 512K->253K(1984K), 0.0018368 secs]
[GC 765K->467K(1984K), 0.0015165 secs]
[GC 979K->682K(1984K), 0.0016116 secs]
[GC 1194K->900K(1984K), 0.0015495 secs]
[GC 1412K->1112K(1984K), 0.0015553 secs]
[GC 1624K->1324K(1984K), 0.0014902 secs]
[GC 1836K->1537K(2112K), 0.0016068 secs]
[Full GC 1537K->1537K(2112K), 0.0120419 secs]
[GC 2047K->1824K(3136K), 0.0019275 secs]
[GC 2336K->2035K(3136K), 0.0016584 secs]
[GC 2547K->2248K(3136K), 0.0015602 secs]
[GC 2760K->2461K(3136K), 0.0015517 secs]
[GC 2973K->2673K(3264K), 0.0015695 secs]
[Full GC 2673K->2673K(3264K), 0.0144533 secs]
[GC 3185K->2886K(5036K), 0.0013183 secs]
[GC 3398K->3098K(5036K), 0.0015822 secs]
[GC 3610K->3461K(5036K), 0.0028318 secs]
[GC 3973K->3673K(5036K), 0.0019273 secs]
[GC 4185K->3885K(5036K), 0.0019377 secs]
[GC 4397K->4097K(5036K), 0.0012906 secs]
[GC 4609K->4309K(5036K), 0.0017647 secs]
[GC 4821K->4521K(5036K), 0.0017731 secs]
[Full GC 4521K->4521K(5036K), 0.0222485 secs]
[GC 4971K->4708K(8012K), 0.0042461 secs]
[GC 5220K->4920K(8012K), 0.0018258 secs]
[GC 5432K->5133K(8012K), 0.0018648 secs]
[GC 5645K->5345K(8012K), 0.0018069 secs]
[GC 5857K->5558K(8012K), 0.0017825 secs]
[GC 6070K->5771K(8012K), 0.0018911 secs]
[GC 6283K->5984K(8012K), 0.0016350 secs]
[GC 6496K->6197K(8012K), 0.0020342 secs]
[GC 6475K->6312K(8012K), 0.0013560 secs]
[Full GC 6312K->6118K(8012K), 0.0341375 secs]
[GC 6886K->6737K(11032K), 0.0045417 secs]
[GC 7505K->7055K(11032K), 0.0027473 secs]
[GC 7823K->7374K(11032K), 0.0028045 secs]
[GC 8142K->7693K(11032K), 0.0029234 secs]
[GC 8461K->8012K(11032K), 0.0027353 secs]
[GC 8780K->8331K(11032K), 0.0027790 secs]
[GC 9099K->8651K(11032K), 0.0028329 secs]
[GC 9419K->8970K(11032K), 0.0027895 secs]
[GC 9738K->9289K(11032K), 0.0028037 secs]
[GC 10057K->9608K(11032K), 0.0028161 secs]
[GC 10376K->9927K(11032K), 0.0028482 secs]
[GC 10695K->10246K(11032K), 0.0028858 secs]
[GC 11014K->10565K(11416K), 0.0029284 secs]
[Full GC 10565K->10565K(11416K), 0.0506198 secs]
[GC 11781K->11071K(18956K), 0.0035594 secs]
[GC 12287K->11577K(18956K), 0.0042315 secs]
[GC 12793K->12082K(18956K), 0.0043194 secs]
[GC 12843K->12390K(18956K), 0.0030633 secs]
[GC 13606K->13494K(18956K), 0.0085937 secs]
[Full GC 13782K->13613K(18956K), 0.0646513 secs]