Sunday, 13 March 2011

Custom Implementations in collections

Many programmers will never need to implement their own collections classes. You can go pretty far using the implementations described in the previous sections of this lesson. Someday, however, you might find yourself wanting to write your own implementation of a core collection interface. It turns out that it's easy to do with the abstract implementations provided by the Java platform. But before we discuss how to write an implementation, let's discuss why you might want to do such a thing.

Reasons to Write Your Own Implementation

This list enumerates several kinds of collections you might want to implement. It is not intended to be exhaustive.
  • Persistent: All of the built-in collection implementations reside in main memory, and vanish when the VM exits. Suppose you want a collection that will still be present the next time the VM starts. One fine way to implement such a collection is to build a veneer over an external database. Such a collection might conceivably be concurrently accessible by multiple VMs, since it resides outside the VM.
  • Application-specific: This is a very broad category. One example is an unmodifiable Map containing real-time telemetry data. The keys might represent locations, and the values could be read from sensors at these locations in response to the get operation.
  • Highly Concurrent: The built-in collections are not designed to support high concurrency. The synchronization wrappers (and the legacy implementations) lock the entire collection every time it's accessed. Suppose you're building a server and you need a Map implementation that can be accessed by many threads concurrently. It is reasonably straightforward to build a hash table that locks each bucket separately, allowing multiple threads to access the table concurrently (assuming they're accessing keys that hash to different buckets).
  • High-performance, Special-purpose: There are many data structures that take advantage of restricted usage to offer better performance than is possible with general-purpose implementations. For example, consider a Set whose elements are restricted to a small, fixed universe. Such a Set can be represented as a bit-vector, which offers blindingly fast performance as well low memory usage. Another example concerns a List containing long runs of identical element values. Such lists, which occur frequently in text processing, can be run-length encoded: runs can be represented as a single object containing the repeated element and the number of consecutive repetitions. This example is interesting because it trades off two aspects of performance: It requires far less space than an ArrayList, but more time.
  • High-performance, General-purpose: The engineers who designed the collections framework tried to provide the best general-purpose implementations for each interface, but there are many, many data structures that could have been used, and new ones are invented every day. Maybe you can come up with something faster!
  • Enhanced functionality: Suppose you need a Map (or Set) implementation that offers constant time access, as well as insertion-order iteration. This combination can be achieved with a hash table, all of whose elements are further joined, in insertion order, into a doubly-linked list. Alternatively, suppose you need an efficient bag implementation (also known as a multiset): a Collection that offers constant time access while allowing duplicate elements. It's reasonably straightforward to implement such a collection atop a HashMap.
  • Convenience: You may want additional convenience implementations beyond those offered by the Java platform. For instance, you may have a frequent need for immutable Map objects representing a single key-value mapping, or List objects representing a contiguous range of Integers, or whatever.
  • Adapter: Suppose you are using some legacy API that has its own ad hoc collections API. You can write an adapter implementation that permits these collections to operate in the Java Collections Framework. An adapter implementation is a thin veneer that wraps objects of one type and makes them behave like objects of another type by translating operations on the latter type into operations on the former.

How to Write a Custom Implementation

Writing a custom implementation is surprisingly easy with the aid of the abstract implementations furnished by the Java platform. Abstract implementations are skeletal implementations of the core collection interfaces designed expressly to facilitate custom implementations. We'll start with an example. Here's an implementation of Arrays.asList.
public static List asList(Object[] a) {
return new ArrayList(a);
}

private static class ArrayList extends AbstractList
implements java.io.Serializable
{
private Object[] a;

ArrayList(Object[] array) {
a = array;
}

public Object get(int index) {
return a[index];
}

public Object set(int index, Object element) {
Object oldValue = a[index];
a[index] = element;
return oldValue;
}

public int size() {
return a.length;
}
}
Believe it or not, this is almost exactly the implementation contained in the JDK. It's that simple! You provide a constructor, the get, set, and size methods, and AbstractList does all the rest. You get the ListIterator, bulk operations, search operations, hash code computation, comparison and string representation for free. Suppose you want to make the implementation a bit faster. The API documentation for the abstract implementations describes precisely how each method is implemented so you'll know which methods to override in order to get the performance that you want. The performance of the implementation above is fine, but it can be improved a bit. In particular, the toArray method iterates over the List copying one element at a time. Given the internal representation, it's a lot faster and more sensible just to clone the array:
public Object[] toArray() {
return (Object[]) a.clone();
}
With the addition of this override, and a similar one for toArray(Object[]), this implementation is exactly the one found in the JDK. In the interests of full disclosure, it's a bit tougher to use the other abstract implementations because they require you to write your own iterator, but it's still not that hard. The abstract implementations are summarized below:
  • AbstractCollection(in the API reference documentation): A Collection that is neither a Set nor a List, such as a bag. At a minimum, you must provide the iterator and the size method.
  • AbstractSet(in the API reference documentation): A Set. Use is identical to AbstractCollection.
  • AbstractList(in the API reference documentation): A List backed by a random-access data store (such as an array). At a minimum, you must provide the positional access methods (get(int) and, optionally, set(int), remove(int), and add(int)) and the size method. The abstract class takes care of listIterator (and iterator).
  • AbstractSequentialList(in the API reference documentation): A List backed by a sequential-access data store (such as a linked list). At a minimum, you must provide the listIterator and size methods. The abstract class takes care of the positional access methods. (This is the opposite of AbstractList.)
  • AbstractMap(in the API reference documentation): A Map. At a minimum you must provide the entrySet view. This is typically implemented with the AbstractSet class. If the Map is modifiable, you must also provide the put method.
The process of writing a custom implementation is summarized below:
  1. Choose the appropriate abstract implementation class from the list above.
  2. Provide implementations for all of the class's abstract methods. If your custom collection is to be modifiable, you'll have to override one or more concrete methods as well. The API documentation for the abstract implementation class will tell you which methods to override.
  3. Test and, if necessary, debug the implementation. You now have a working custom collection implementation!
  4. If you're concerned about performance, read the abstract implementation class's API documentation for all of the methods whose implementations you're inheriting. If any of them seem too slow, override them. If you override any methods, be sure to measure the performance of the method before and after the override! How much effort you put into tweaking the performance should be a function of how much use the implementation will get, and how performance-critical the use. (Often, this step is best omitted.)

No comments:

Post a Comment