Showing posts with label constant / literal pool. Show all posts
Showing posts with label constant / literal pool. Show all posts

Wednesday, 22 June 2011

Somethings to remember about saving memory with the String.intern method

Imagine that you have a flat file in csv format. And it has a 100 million rows from which you are about to read data to store and process in your app.
The data is in the format (orderId, storeIdentifier, amountDue).
What optimization can you do here?
Note that the storeIdentifier is going to be repeated a lot. Every time you read a record and split it into a string and possibly store it into an in-memory data structure, you will be creating a new String object. So 100 million String objects will be created for the storeIdentifier. But you know that there are only (say) 100 stores in all! So there is a massive amount of wasted memory.
What you can do here is – Right after you have read the storeIdentifier string, do this -
storeIdentifier = storeIdentifier.intern();
That would put the store identifier into the String pool and keep the number of String instances with the same data minimal by returning the String from the pool once it has been put into it by the first invocation for the string.
Points to Note
  1. Use intern() only if you really need to use it. And only if you know the extra instances are going to be a problem. And only if you really understand how it works.
  2. Older JVMs had a problem collecting interned strings. Newer JVMs handle this fine. Don’t worry about leaks due to a growing pool. If other references are gone, interned strings will be collected by the GC.
  3. Interned strings go into the PermGen Space area of memory in some JVMs. This is not part of the normal heap. If you send too many strings here, an OutOfMemoryError will hit you even though your heap may have several GB available.
  4. Interned strings can be compared with == rather than .equals(). This is a bit faster. But it is rarely worth the brittle code.
  5. Calling String.intern() can be a performance hit. It takes CPU cycles to maintain the pool and do the comparisons. Are you sure you are saving enough memory to make it worth the CPU? Measure. Don’t guess.
  6. Use String.intern() only if the set of possible Strings that will be interned has a bound tight enough such that the set of different strings is much smaller than the total number of strings that will be read.

Tuesday, 21 June 2011

Some issues with Autoboxing

Integer j1 = 127;
Integer j2 = 127;
System.out.println( j1==j2); //Works!!!
Integer k1 = 128;
Integer k2 = 128;
System.out.println( k1==k2); //Doesn't Work!!!
Integer w1 = -128;
Integer w2 = -128;
System.out.println( w1==w2); //Works!!!
Integer m1 = -129;
Integer m2 = -129;
System.out.println( m1==m2); //Doesn't Work!!!

This gives us similarity to constant string pool.

Saturday, 30 April 2011

Immutable Objects / Wrapper Class Caching

Since Java 5, wrapper class caching was introduced.
The following is an examination of the cache created by an inner class, IntegerCache, located in the Integer cache. For example, the following code will create a cache:
Integer myNumber = 10; or Integer myNumber = Integer.valueOf(10);

256 Integer objects are created in the range of -128 to 127 which are all stored in an Integer array. This caching functionality can be seen by looking at the inner class, IntegerCache, which is found in Integer:
private static class IntegerCache 
{
private IntegerCache(){}

static final Integer cache[] = new Integer[-(-128) + 127 + 1];

static
{
for(int i = 0; i < cache.length; i++)
cache[i] = new Integer(i - 128);
}
}

public static Integer valueOf(int i)
{
final int offset = 128;
if (i >= -128 && i <= 127) // must cache
{
return IntegerCache.cache[i + offset];
}
return new Integer(i);
}


So when creating an object using Integer.valueOf or directly assigning a value to an Integer within the range of -128 to 127 the same object will be returned. Therefore, consider the following example:

Integer i = 100;
Integer p = 100;
if (i == p)
System.out.println("i and p are the same.");
if (i != p)
System.out.println("i and p are different.");
if(i.equals(p))
System.out.println("i and p contain the same value.");

The output is:
i and p are the same.
i and p contain the same value.


So this result is similar to the result as shown here for strings.

It is important to note that object i and p only equate to true because they are the same object, the comparison is not based on the value, it is based on object equality. If Integer i and p are outside the range of -128 or 127 the cache is not used, therefore new objects are created. When doing a comparison for value always use the “.equals” method. It is also important to note that instantiating an Integer does not create this caching. So consider the following example:

Integer i = new Integer (100);
Integer p = new Integer(100);
if(i==p)
System.out.println(“i and p are the same object”);
if(i.equals(p))
System.out.println(“ i and p contain the same value”);


In this circumstance, the output is only: i and p contain the same value

The other wrapper classes (Byte, Short, Long, Character) also contain this caching mechanism OR constant pooling mechanism. The Byte, Short and Long all contain the same caching principle to the Integer object. The Character class caches from 0 to 127. The negative cache is not created for the Character wrapper as these values do not represent a corresponding character. There is no caching for the Float object.

BigDecimal also uses caching but uses a different mechanism. While the other objects contain a inner class to deal with caching this is not true for BigDecimal, the caching is pre-defined in a static array and only covers 11 numbers, 0 to 10:

 
// Cache of common small BigDecimal values.
private static final BigDecimal zeroThroughTen[] = {
new BigDecimal(BigInteger.ZERO, 0, 0),
new BigDecimal(BigInteger.ONE, 1, 0),
new BigDecimal(BigInteger.valueOf(2), 2, 0),
new BigDecimal(BigInteger.valueOf(3), 3, 0),
new BigDecimal(BigInteger.valueOf(4), 4, 0),
new BigDecimal(BigInteger.valueOf(5), 5, 0),
new BigDecimal(BigInteger.valueOf(6), 6, 0),
new BigDecimal(BigInteger.valueOf(7), 7, 0),
new BigDecimal(BigInteger.valueOf(8), 8, 0),
new BigDecimal(BigInteger.valueOf(9), 9, 0),
new BigDecimal(BigInteger.TEN, 10, 0),
};


As per Java Language Specification(JLS) the values discussed above are stored as immutable wrapper objects. This caching has been created because it is assumed these values / objects are used more frequently.

String equality and interning in java

Strings in Java are objects, but resemble primitives (such as ints or chars) in that Java source code may contain String literals, and Strings may be concatenated using the “+” operator. These are convenient features, but the similarity of Strings to primitives sometimes causes confusion when Strings are compared.

As we saw here, how java deals with string comparisons. Lets understand the case 2, where == operator returns true for 2 different references having same values.
To save memory (and speed up testing for equality), Java supports “interning” of Strings. When the intern() method is invoked on a String, a lookup is performed on a table of interned Strings. If a String object with the same content is already in the table, a reference to the String in the table is returned. Otherwise, the String is added to the table and a reference to it is returned. The result is that after interning, all Strings with the same content will point to the same object. This saves space, and also allows the Strings to be compared using the == operator, which is much faster than comparison with the equals(Object) method.

Confusion can arise because Java automatically interns String literals. This means that in many cases, the == operator appears to work for Strings in the same way that it does for ints or other primitive values. Code written based on this assumption will fail in a potentially non-obvious way when the == operator is used to compare Strings with equal content but contained in different String instances.
Following test cases show how interning can be performed by java:
Consider following string i.e “A String":
 
String aString = "A String"; 



Case 1: Concatenated string


String aConcatentatedString = "A" + " " + "String";


aString == aConcatentatedString       : true
aString.equals(aConcatentatedString) : true

Case 2: Runtime string




String aRuntimeString = new String("A String");


aString == aConcatentatedString       : false
aString.equals(aConcatentatedString) : true


Case 3: Interned string


String anInternedString = aRuntimeString.intern();

aString == aConcatentatedString : true
aString.equals(aConcatentatedString) : true



Case 4: External strings , eg. 1st argument of main method


String firstArg = args[0];


aString == aConcatentatedString : false
aString.equals(aConcatentatedString) : true


Case 5: Using intern on external strings


String firstArgInterned = firstArg.intern();

aString == aConcatentatedString : true
aString.equals(aConcatentatedString) : true




So we can see that explicitly invoking intern() returns a reference to the interned String.

Similar to string, there is a pool of integers, bytes, etc and other value based classes like bigdecimal. See here for more on this.

== vs equals method in case of String and primitive types

Object equality is tested using == operator. Here it is checked whether 2 references are pointing to same object or not.
While value equality is tested using the .equals(Object) method.
So consider following code snippet:
String one = new String("abc");
String two = new String("abc");


So here

one==two is false

one.equals(two) is true

one==two, because we are creating 2 new objects as specified by new keywords. But as their values are same one.equals(two) returns true because values are same.

Now if we say:

String three = one;



Now both one==two as well one.equals(two) returns true.

Using new keyword vs not using it to create 2 strings


Now consider 2 cases, when we create 2 strings with same value, but first using new keyword and then not using new keyword:

Case1 : Using new keyword:

String data = new String("123");
String moreData = new String("123");



Here data==moreData is false.

Case 2 : Not using new keyword:

String letters = "abc";
String moreLetters = "abc";


Here data==moreData is true.

Why?

Explaining the case 1 is very simple. Because it falls inline with our earlier explanation of == and != operators. But how can we explain case 2, where string comparison using == operator, returns true.

This is due to the compiler and runtime efficiency. In the compiled class file only one set of data "abc" is stored, not two. In this situation only one object is created, therefore the equality is true between these object. Read here for more on this. Not only strings , compiler also support this type of constant pooling for integers, bytes and other value types. See here for this.


Note:

Even though one set of data "123" is stored in the class, this is still treated differently at runtime. An explicit instantiation is used to create the String objects. Therefore, in this case, two objects have been created, so the equality is false. It is important to note that "==" is always used for object equality and does not ever refer to the values in an object. Always use .equals when checking looking for a "meaningful" comparison.

Sunday, 17 April 2011

Weakhashmap : Using string from literal pool as key

Consider the following code snippet:
public class TestWeakHashMap
{
private String str1 = new String("newString1");
private String str2 = "literalString2";
private String str3 = "literalString3";
private String str4 = new String("newString4");
private Map map = new WeakHashMap();

private void testGC() throws IOException
{
map.put(str1, new Object());
map.put(str2, new Object());
map.put(str3, new Object());
map.put(str4, new Object());

/**
        * Discard the strong reference to all the keys
        */
str1 = null;
str2 = null;
str3 = null;
str4 = null;

while (true) {
System.gc();
/**
            * Verify Full GC with the -verbose:gc option
            * We expect the map to be emptied as the strong references to
            * all the keys are discarded.
            */
System.out.println("map.size(); = " + map.size() + " " + map);
}
}
}

What do we expect the size of the map to be after full GC? I initially thought it should be empty. But it turned out to be 2.

Look at the way the four Strings are initialized. Two of them are defined using the 'new' operator, whereas the other two are defined as literals. The Strings defined using the 'new' operator would be allocated in the Java heap, but the Strings defined defined as literals would be in the literal pool.
The Strings allocated in the literal pool (Perm Space) would never be garbage collected.
This would mean that String 'str2' and 'str3' would always be strongly referenced and the corresponding entry would never be removed from the WeakHashMap.

So next time you create a 'new String()' , put it as a key in a WeakHashMap, and later intern() the String, beware - Your key will always be strongly referenced.

Invoking intern() method on a String will add your String to the literal pool if some other String equal to this String does not exist in the pool
private String str5 = (str4+str1).intern();