Showing posts with label set-implementation. Show all posts
Showing posts with label set-implementation. Show all posts

Tuesday, 24 May 2011

Operations on SortedSet

The operations that SortedSet inherits from Set behave identically on sorted sets and normal sets with two exceptions:
  • The Iterator returned by the iterator operation traverses the sorted set in order.
  • The array returned by toArray contains the sorted set's elements in order.
Although the interface doesn't guarantee it, the toString method of the JDK's SortedSet implementations returns a string containing all the elements of the sorted set, in order.

Thursday, 19 May 2011

Set implementations in java

Being a Collection subtype all methods in the Collection interface are also available in the Set interface.
Since Set is an interface you need to instantiate a concrete implementation of the interface in order to use it. You can choose between the following Set implementations in the Java Collections API:
  • java.util.EnumSet
  • java.util.HashSet
  • java.util.LinkedHashSet
  • java.util.TreeSet
Each of these Set implementations behaves a little differently with respect to the order of the elements when iterating the Set, and the time (big O notation) it takes to insert and access elements in the sets

Wednesday, 18 May 2011

HashSet class in java

Introduction

  • Set interface implemented using a hash table
  • doesn't guarantee to iterate the elements in any specific order
  • constant time access to elements assuming good hash
    function

HashSet class Constructors

HashSet is implemented with an underlying HashMap. In addition to implemented the Set interface methods, HashSet has the following constructors.
Result Constructor Description
hset = new HashSet() Creates a new HashSet with default initial capacity 16 and load factor 0.75.
hset = new HashSet(initialCapacity) Creates a new HashSet with the specified initial int capacity.
hset = new HashSet(initialCapacity, loadFactor) Creates a new HashSet with the specified capacity which will not exceed a specified (float) load factor.
hset = new HashSet(coll) Creates a new HashSet with elements from the Collection coll

TreeSet class constructors in java

Introduction

  • Set interface implemented as a tree
  • Iterator return elements in natural order (see later)

Constructors of TreeSet Class

TreeSet implements the Set and SortedSet interface methods. TreeSet is implemented with an underlying TreeMap (balanced binary tree). If the element type has a natural order (eg, String) elements will be ordered by that, but often you will supply a Comparator object that tells how two elements compare. It has the following constructors.

Result Constructor Description
tset = new TreeSet() Creates new TreeSet. Elements sorted by natural order.
tset = new TreeSet(comp) Creates new TreeSet using Comparator comp to sort elements.
tset = new TreeSet(coll) Creates new TreeSet from Collection coll using natural ordering.
tset = new TreeSet(sset) Creates new TreeSet from SortedSet smp using element ordering from sset.

Saturday, 14 May 2011

Performance of Set interface implementations

HashSet

The HashSet class offers constant-time [ Big O Notation is O(1) ] performance for the basic operations (add, remove, contains and size), assuming the hash function disperses the elements properly among the buckets. Iterating over this set requires time proportional to the sum of the HashSet instance's size (the number of elements) plus the "capacity" of the backing HashMap instance (the number of buckets). Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.

 

TreeSet

The TreeSet implementation provides guaranteed log(n) time cost for the basic operations (add, remove and contains).

 

LinkedHashSet

A linked hash set has two parameters that affect its performance: initial capacity and load factor. They are defined precisely as for HashSet. Note, however, that the penalty for choosing an excessively high value for initial capacity is less severe for this class than for HashSet, as iteration times for this class are unaffected by capacity.

Implementing set using arrays

We can implement tree by arrays. Likewise, this program shows how to implement set using arrays:

public class Set<T> {

private T arrayElement[];

int size =0;

public Set(){

this.arrayElement = null;

}

public Set(T[] element){

arrayElement = element;

size = arrayElement.length;

}

/**

*add element to set. A check is made to identify whether element is present or not.

*If not the element can be inserted.

* @param element

*/

public void addElement(T element){

if(!contains(element)){

if(size == arrayElement.length){

incrementArray();

}

arrayElement[size++] = element;

}

}



/**

* to check is element is present or not.

* @param elem

* @return boolean

*/

public boolean contains(T elem){



if (elem == null) {

for (int i = 0; i < size; i++)

if (arrayElement[i]==null)

return true;

} else {

for (int i = 0; i < size; i++)

if (elem.equals(arrayElement[i]))

return true;

}

return false;



}



/**

* return the size of set.

* @return int

*/

public int size(){

if(arrayElement != null){

return arrayElement.length;

}else

return 0;

}



public void clear(){

arrayElement = null;

}



public String toString(){

if(arrayElement == null || arrayElement.length ==0 ){

return“[EMPTY]“;

}else{

String toStr=”[";

for(int i=0;i<arrayElement.length;i++){

toStr+=arrayElement[i]+”,”;

}

toStr+=”]”;

return toStr;

}

}



/**

* to check whether set is empty or not

* @return

*/

public boolean isEmpty(){

if(arrayElement == null || arrayElement.length ==0 )

return true;

else

return false;

}



/**

* this function is used to increment the size of an array

*

*/

private void incrementArray(){

T[] temparray = arrayElement;

int tempsize=size+5;

arrayElement =(T[]) new Object[tempsize];

System.arraycopy(temparray, 0, arrayElement, 0, size);



}

}//Set class ends

Sunday, 17 April 2011

When to use which set implementation?

HashSet:
HashSet is backed by a HashMap instance and hence it allows the null element. It makes no guarantees as to the iteration order of the set; in particular, it does not guarantee that the order will remain constant over time.

When do you use HashSet?
When you are looking for performance, use HashSet. Since this class uses the hash function when retrieving the elements, it allows fast retrieval. This class offers constant time performance for the basic operations add, remove, contains and size, assuming the hash function disperses the elements properly among the buckets. Iterating over this set requires time proportional to the sum of the HashSet instance's size (the number of elements) plus the "capacity" of the backing HashMap instance the number of buckets. Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.

LinkedHashSet:
LinkedHashSet is backed by LinkedHashMap. So all the elements in LinkedHashSet are actually the keys in the LinkedHashMap. It maintains a doubly-linked list running through all of its entries. This linked list defines the iteration ordering, which is the order in which elements were inserted into the set (insertion-order). This class provides all of the optional Set operations, and permits null elements.

When do you use LinkedHashSet?
When order of insertion matter. Example
When you are looking to produce a copy of a set that has the same order as the original, regardless of the original set's implementation without incurring the increased cost (associated with TreeSet).

void foo(Set s) {
Set copy = new LinkedHashSet(s);
...
}

This technique is particularly useful if a module takes a set on input, copies it, and later returns results whose order is determined by that of the copy. Like HashSet, it provides constant-time performance for the basic operations add, contains and remove), assuming the hash function disperses elements properly among the buckets. Performance is likely to be just slightly below that of HashSet, due to the added expense of maintaining the linked list, with one exception: Iteration over a LinkedHashSet requires time proportional to the size of the set, regardless of its capacity. Iteration over a HashSet is likely to be more expensive, requiring time proportional to its capacity.

A linked hash set has two parameters that affect its performance: initial capacity and load factor. They are defined precisely as for HashSet. Note, however, that the penalty for choosing an excessively high value for initial capacity is less severe for this class than for HashSet, as iteration times for this class are unaffected by capacity.

TreeSet:
TreeSet is backed by TreeMap instance and TreeSet wont permit null elements. As opposing to HashSet, TreeSet provides a total ordering on its elements and especially when you need a sorted order. The elements are ordered either by using their plain Comparable natural ordering (if you dont pass any parameter for the TreeSet constructor) or by a Comparator typically provided at sorted set creation time as a parameter to the constructor. All elements inserted into this set must either implement the Comparable interface or atleast be accepted by the specified comparator. So all such elements must be mutually comparable i.e., e1.compareTo(e2) (or comparator.compare(e1, e2)) must not throw a ClassCastException.

When do you use TreeSet?
Its very well explained above that TreeSet will be used when the sorted order of inserted elements is required.
What about the performance? The performance obviously will be low compared to HashSet. A TreeSet may be accessed and traversed in either ascending or descending order. The descendingSet method returns a view of the set with the senses of all relational and directional methods inverted. The performance of ascending operations and views is likely to be faster than that of descending ones.

EnumSet:
EnumSet as the name itself says is a specialized set for use with enum types. All of the elements in an enum set must come from a single enum type that is specified, explicitly or implicitly, when the set is created. Enum sets are represented internally as bit vectors. This representation is extremely compact and efficient. The space and time performance of this class should be good enough to allow its use as a high-quality, typesafe alternative to traditional int-based "bit flags." Even bulk operations (such as containsAll and retainAll) should run very quickly if their argument is also an enum set. All basic operations execute in constant time. They are likely (though not guaranteed) to be much faster than their HashSet counterparts. Even bulk operations execute in constant time if their argument is also an enum set.
The iterator returned by the iterator method traverses the elements in their natural order (the order in which the enum constants are declared). The returned iterator is weakly consistent: it will never throw ConcurrentModificationException and it may or may not show the effects of any modifications to the set that occur while the iteration is in progress.

Null elements are not permitted in EnumSet. Attempts to insert a null element will throw NullPointerException. Attempts to test for the presence of a null element or to remove one will, however, function properly.
EnumSet internally uses RegularEnumSet class, a private implementation class for EnumSet, for "regular sized" enum types i.e., those with 64 or fewer enum constants while it used JumboEnumSet class if the enum constants are more than 64 which is another private implementation class for EnumSet.