Showing posts with label intern. Show all posts
Showing posts with label intern. Show all posts

Wednesday, 22 June 2011

Somethings to remember about saving memory with the String.intern method

Imagine that you have a flat file in csv format. And it has a 100 million rows from which you are about to read data to store and process in your app.
The data is in the format (orderId, storeIdentifier, amountDue).
What optimization can you do here?
Note that the storeIdentifier is going to be repeated a lot. Every time you read a record and split it into a string and possibly store it into an in-memory data structure, you will be creating a new String object. So 100 million String objects will be created for the storeIdentifier. But you know that there are only (say) 100 stores in all! So there is a massive amount of wasted memory.
What you can do here is – Right after you have read the storeIdentifier string, do this -
storeIdentifier = storeIdentifier.intern();
That would put the store identifier into the String pool and keep the number of String instances with the same data minimal by returning the String from the pool once it has been put into it by the first invocation for the string.
Points to Note
  1. Use intern() only if you really need to use it. And only if you know the extra instances are going to be a problem. And only if you really understand how it works.
  2. Older JVMs had a problem collecting interned strings. Newer JVMs handle this fine. Don’t worry about leaks due to a growing pool. If other references are gone, interned strings will be collected by the GC.
  3. Interned strings go into the PermGen Space area of memory in some JVMs. This is not part of the normal heap. If you send too many strings here, an OutOfMemoryError will hit you even though your heap may have several GB available.
  4. Interned strings can be compared with == rather than .equals(). This is a bit faster. But it is rarely worth the brittle code.
  5. Calling String.intern() can be a performance hit. It takes CPU cycles to maintain the pool and do the comparisons. Are you sure you are saving enough memory to make it worth the CPU? Measure. Don’t guess.
  6. Use String.intern() only if the set of possible Strings that will be interned has a bound tight enough such that the set of different strings is much smaller than the total number of strings that will be read.

Saturday, 30 April 2011

String equality and interning in java

Strings in Java are objects, but resemble primitives (such as ints or chars) in that Java source code may contain String literals, and Strings may be concatenated using the “+” operator. These are convenient features, but the similarity of Strings to primitives sometimes causes confusion when Strings are compared.

As we saw here, how java deals with string comparisons. Lets understand the case 2, where == operator returns true for 2 different references having same values.
To save memory (and speed up testing for equality), Java supports “interning” of Strings. When the intern() method is invoked on a String, a lookup is performed on a table of interned Strings. If a String object with the same content is already in the table, a reference to the String in the table is returned. Otherwise, the String is added to the table and a reference to it is returned. The result is that after interning, all Strings with the same content will point to the same object. This saves space, and also allows the Strings to be compared using the == operator, which is much faster than comparison with the equals(Object) method.

Confusion can arise because Java automatically interns String literals. This means that in many cases, the == operator appears to work for Strings in the same way that it does for ints or other primitive values. Code written based on this assumption will fail in a potentially non-obvious way when the == operator is used to compare Strings with equal content but contained in different String instances.
Following test cases show how interning can be performed by java:
Consider following string i.e “A String":
 
String aString = "A String"; 



Case 1: Concatenated string


String aConcatentatedString = "A" + " " + "String";


aString == aConcatentatedString       : true
aString.equals(aConcatentatedString) : true

Case 2: Runtime string




String aRuntimeString = new String("A String");


aString == aConcatentatedString       : false
aString.equals(aConcatentatedString) : true


Case 3: Interned string


String anInternedString = aRuntimeString.intern();

aString == aConcatentatedString : true
aString.equals(aConcatentatedString) : true



Case 4: External strings , eg. 1st argument of main method


String firstArg = args[0];


aString == aConcatentatedString : false
aString.equals(aConcatentatedString) : true


Case 5: Using intern on external strings


String firstArgInterned = firstArg.intern();

aString == aConcatentatedString : true
aString.equals(aConcatentatedString) : true




So we can see that explicitly invoking intern() returns a reference to the interned String.

Similar to string, there is a pool of integers, bytes, etc and other value based classes like bigdecimal. See here for more on this.

Sunday, 17 April 2011

Weakhashmap : Using string from literal pool as key

Consider the following code snippet:
public class TestWeakHashMap
{
private String str1 = new String("newString1");
private String str2 = "literalString2";
private String str3 = "literalString3";
private String str4 = new String("newString4");
private Map map = new WeakHashMap();

private void testGC() throws IOException
{
map.put(str1, new Object());
map.put(str2, new Object());
map.put(str3, new Object());
map.put(str4, new Object());

/**
        * Discard the strong reference to all the keys
        */
str1 = null;
str2 = null;
str3 = null;
str4 = null;

while (true) {
System.gc();
/**
            * Verify Full GC with the -verbose:gc option
            * We expect the map to be emptied as the strong references to
            * all the keys are discarded.
            */
System.out.println("map.size(); = " + map.size() + " " + map);
}
}
}

What do we expect the size of the map to be after full GC? I initially thought it should be empty. But it turned out to be 2.

Look at the way the four Strings are initialized. Two of them are defined using the 'new' operator, whereas the other two are defined as literals. The Strings defined using the 'new' operator would be allocated in the Java heap, but the Strings defined defined as literals would be in the literal pool.
The Strings allocated in the literal pool (Perm Space) would never be garbage collected.
This would mean that String 'str2' and 'str3' would always be strongly referenced and the corresponding entry would never be removed from the WeakHashMap.

So next time you create a 'new String()' , put it as a key in a WeakHashMap, and later intern() the String, beware - Your key will always be strongly referenced.

Invoking intern() method on a String will add your String to the literal pool if some other String equal to this String does not exist in the pool
private String str5 = (str4+str1).intern();