Sunday, 17 April 2011

When to use which set implementation?

HashSet:
HashSet is backed by a HashMap instance and hence it allows the null element. It makes no guarantees as to the iteration order of the set; in particular, it does not guarantee that the order will remain constant over time.

When do you use HashSet?
When you are looking for performance, use HashSet. Since this class uses the hash function when retrieving the elements, it allows fast retrieval. This class offers constant time performance for the basic operations add, remove, contains and size, assuming the hash function disperses the elements properly among the buckets. Iterating over this set requires time proportional to the sum of the HashSet instance's size (the number of elements) plus the "capacity" of the backing HashMap instance the number of buckets. Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.

LinkedHashSet:
LinkedHashSet is backed by LinkedHashMap. So all the elements in LinkedHashSet are actually the keys in the LinkedHashMap. It maintains a doubly-linked list running through all of its entries. This linked list defines the iteration ordering, which is the order in which elements were inserted into the set (insertion-order). This class provides all of the optional Set operations, and permits null elements.

When do you use LinkedHashSet?
When order of insertion matter. Example
When you are looking to produce a copy of a set that has the same order as the original, regardless of the original set's implementation without incurring the increased cost (associated with TreeSet).

void foo(Set s) {
Set copy = new LinkedHashSet(s);
...
}

This technique is particularly useful if a module takes a set on input, copies it, and later returns results whose order is determined by that of the copy. Like HashSet, it provides constant-time performance for the basic operations add, contains and remove), assuming the hash function disperses elements properly among the buckets. Performance is likely to be just slightly below that of HashSet, due to the added expense of maintaining the linked list, with one exception: Iteration over a LinkedHashSet requires time proportional to the size of the set, regardless of its capacity. Iteration over a HashSet is likely to be more expensive, requiring time proportional to its capacity.

A linked hash set has two parameters that affect its performance: initial capacity and load factor. They are defined precisely as for HashSet. Note, however, that the penalty for choosing an excessively high value for initial capacity is less severe for this class than for HashSet, as iteration times for this class are unaffected by capacity.

TreeSet:
TreeSet is backed by TreeMap instance and TreeSet wont permit null elements. As opposing to HashSet, TreeSet provides a total ordering on its elements and especially when you need a sorted order. The elements are ordered either by using their plain Comparable natural ordering (if you dont pass any parameter for the TreeSet constructor) or by a Comparator typically provided at sorted set creation time as a parameter to the constructor. All elements inserted into this set must either implement the Comparable interface or atleast be accepted by the specified comparator. So all such elements must be mutually comparable i.e., e1.compareTo(e2) (or comparator.compare(e1, e2)) must not throw a ClassCastException.

When do you use TreeSet?
Its very well explained above that TreeSet will be used when the sorted order of inserted elements is required.
What about the performance? The performance obviously will be low compared to HashSet. A TreeSet may be accessed and traversed in either ascending or descending order. The descendingSet method returns a view of the set with the senses of all relational and directional methods inverted. The performance of ascending operations and views is likely to be faster than that of descending ones.

EnumSet:
EnumSet as the name itself says is a specialized set for use with enum types. All of the elements in an enum set must come from a single enum type that is specified, explicitly or implicitly, when the set is created. Enum sets are represented internally as bit vectors. This representation is extremely compact and efficient. The space and time performance of this class should be good enough to allow its use as a high-quality, typesafe alternative to traditional int-based "bit flags." Even bulk operations (such as containsAll and retainAll) should run very quickly if their argument is also an enum set. All basic operations execute in constant time. They are likely (though not guaranteed) to be much faster than their HashSet counterparts. Even bulk operations execute in constant time if their argument is also an enum set.
The iterator returned by the iterator method traverses the elements in their natural order (the order in which the enum constants are declared). The returned iterator is weakly consistent: it will never throw ConcurrentModificationException and it may or may not show the effects of any modifications to the set that occur while the iteration is in progress.

Null elements are not permitted in EnumSet. Attempts to insert a null element will throw NullPointerException. Attempts to test for the presence of a null element or to remove one will, however, function properly.
EnumSet internally uses RegularEnumSet class, a private implementation class for EnumSet, for "regular sized" enum types i.e., those with 64 or fewer enum constants while it used JumboEnumSet class if the enum constants are more than 64 which is another private implementation class for EnumSet.

No comments:

Post a Comment