Wednesday, 22 September 2010

Object Ordering : Comparable Interface

A List l may be sorted as follows:
Collections.sort(l);
If the list consists of String elements, it will be sorted into lexicographic (alphabetical) order. If it consists of Date elements, it will be sorted into chronological order. How does Java know how to do this? It's magic! Well, no. Actually String and Date both implement the Comparable(in the API reference documentation) interface. The Comparable interfaces provides a natural ordering for a class, which allows objects of that class to be sorted automatically. The following table summarizes the JDK classes that implement Comparable:
ClassNatural Ordering
Byte signed numerical
Character unsigned numerical
Long signed numerical
Integer signed numerical
Short signed numerical
Double signed numerical
Float signed numerical
BigInteger signed numerical
BigDecimal signed numerical
File system-dependent lexicographic on pathname.
String lexicographic
Date chronological
CollationKey locale-specific lexicographic
If you try to sort a list whose elements do not implement Comparable, Collections.sort(list) will throw a ClassCastException(in the API reference documentation) . Similarly, if you try to sort a list whose elements cannot be compared to one another, Collections.sort will throw a ClassCastException. Elements that can be compared to one another are called mutually comparable. While it is possible to have elements of different types be mutually comparable, none of the JDK types listed above permit inter-class comparison.
This is all you really need to know about the the Comparable interface if you just want to sort lists of comparable elements, or create sorted collections of them. The next section will be of interest to you if you want to implement your own Comparable type.

Writing Your Own Comparable Types

The Comparable interface consists of a single method:
public interface Comparable {
public int compareTo(Object o);
}
The compareTo method compares the receiving object with the specified object, and returns a negative integer, zero, or a positive integer as the receiving object is less than, equal to, or greater than the specified Object. If the specified object cannot be compared to the receiving object, the method throws a ClassCastException. Here's a class representing a person's name that implements Comparable:
import java.util.*;

public class Name implements Comparable {
private String firstName, lastName;

public Name(String firstName, String lastName) {
if (firstName==null || lastName==null)
throw new NullPointerException();
this.firstName = firstName;
this.lastName = lastName;
}

public String firstName() {return firstName;}
public String lastName() {return lastName;}

public boolean equals(Object o) {
if (!(o instanceof Name))
return false;
Name n = (Name)o;
return n.firstName.equals(firstName) &&
n.lastName.equals(lastName);
}

public int hashCode() {
return 31*firstName.hashCode() + lastName.hashCode();
}

public String toString() {return firstName + " " + lastName;}

public int compareTo(Object o) {
Name n = (Name)o;
int lastCmp = lastName.compareTo(n.lastName);
return (lastCmp!=0 ? lastCmp :
firstName.compareTo(n.firstName));
}
}
To keep the example short, the class is somewhat limited: It doesn't support middle names, it demands both a first and a last name, and it is not internationalized in any way. Nonetheless, it illustrates several important points:
  • Name objects are immutable. All other things being equal, immutable types are the way to go, especially for objects that will be used as elements in Sets, or keys in Maps. These collections will break if you modify their elements or keys while they're in the collection.
  • The constructor checks its arguments for null. This ensures that all Name objects are well-formed, so that none of the other methods will ever throw a NullPointerException.
  • The hashCode method is redefined. This is essential for any class that redefines the equals method. It is required by the general contract for Object.equals. (Equal objects must have equal hash codes.)
  • The equals method returns false if the specified object is null, or of an inappropriate type. The compareTo method throws a runtime exception under these circumstances. Both of these behaviors are required by the general contracts of the respective methods.
  • The toString method has been redefined to print the Name in human-readable form. This is always a good idea, especially for objects that are going to get put into collections. The various collection types' toString methods depend on the toString methods of their elements, keys and values.
Since this section is about element ordering, let's talk a bit more about Name's compareTo method. It implements the standard name-ordering algorithm, where last names take precedence over first names. This is exactly what you want in a natural ordering. It would be very confusing if the natural ordering were unnatural! Take a look at how compareTo is implemented, because it's quite typical. First, you cast the Object argument to the appropriate type. This throws the appropriate exception (ClassCastException) if the arguments type is inappropriate. Then you compare the most significant part of the object (in this case, the last name). Often, you can just use the natural ordering of the part's type. In this case, the part is a String, and the natural (lexicographic) ordering is exactly what's called for. If the comparison results in anything other than zero (which represents equality), you're done: you just return the result. If the most significant parts are equal, you go on to compare the next-most-significant parts. In this case, there are only two parts (first name and last name). If there were more parts, you'd proceed in the obvious fashion, comparing parts until you found two that weren't equal (or you were comparing the least-significant parts), at which point you'd return the result of the comparison.
Just to show that it all works, here's a little program that builds a list of Name objects and sorts them:
import java.util.*;

class NameSort {
public static void main(String args[]) {
Name n[] = {
new Name("John", "Lennon"),
new Name("Karl", "Marx"),
new Name("Groucho", "Marx"),
new Name("Oscar", "Grouch")
};
List l = Arrays.asList(n);
Collections.sort(l);
System.out.println(l);
}
}
If you run this program, here's what it prints:
[Oscar Grouch, John Lennon, Groucho Marx, Karl Marx]
There are four restrictions on the behavior of the compareTo method, which we won't go over now because they're fairly technical and boring and are better left in the API documentation. It's really important that all classes that implement Comparable obey these restrictions, so read the documentation for Comparable if you're writing a class that implements it. Attempting to sort a list of objects that violate these restrictions has undefined behavior. Technically speaking, these restrictions ensure that the natural ordering is a partial order on the objects of a class that implements it; this is necessary to ensure that sorting is well-defined.

Comparators

OK, so now you know about natural ordering. But what if you want to sort some objects in some order other than their natural order? Or what if you want to sort some objects that don't implement Comparable? To do either of these things, you'll need to provide a Comparator(in the API reference documentation) . A Comparator is simply an object that encapsulates an ordering. Like the Comparable interface, the Comparator interface consists of a single method:
public interface Comparator {
int compare(Object o1, Object o2);
}
The compare method compares its two arguments, returning a negative integer, zero, or a positive integer as the first argument is less than, equal to, or greater than the second. If either of the arguments has an inappropriate type for the Comparator, the compare method throws a ClassCastException. Much of what was said about Comparable in the previous section applies to Comparator as well. Writing a compare method is nearly identical to writing a compareTo method, except that the former gets both objects passed in as arguments. The compare method has to obey the same four "technical restrictions" as Comparable's compareTo method, for the same reason: a Comparator must induce a partial order on the objects it compares.
Suppose you have a class called EmployeeRecord:
public class EmployeeRecord implements Comparable {
public Name name();
public int employeeNumber();
public Date hireDate();
...
}
Let's assume that the natural ordering of EmployeeRecord objects is Name-ordering (as defined in the previous example) on employee name. Unfortunately the boss has asked us for a list of employees in order of seniority. This means we actually have to do some work, but not much. Here's a program that will produce the required list:
import java.util.*;

class EmpSort {
static final Comparator SENIORITY_ORDER = new Comparator() {
public int compare(Object o1, Object o2) {
EmployeeRecord r1 = (EmployeeRecord) o1;
EmployeeRecord r2 = (EmployeeRecord) o2;
return r2.hireDate().compareTo(r1.hireDate());
}
};

static final Collection employees = ... ; // Employee Database

public static void main(String args[]) {
List emp = new ArrayList(employees);
Collections.sort(emp, SENIORITY_ORDER);
System.out.println(emp);
}
}
The Comparator in the above program is reasonably straightforward. It casts its arguments to EmployeeRecord, and relies on the natural ordering of Date applied to the hireDate() accessor method. Note that the Comparator passes the hire-date of its second argument to its first, rather than vice-versa. This is because the employee who was hired most recently is least senior: sorting in order of hire-date would put the list in reverse seniority-order. Another way to achieve the same effect would be to maintain the argument order but negate the result of the comparison:
return -r1.hireDate().compareTo(r2.hireDate());
The two techniques are equally preferable. Use whichever looks best to you. The Comparator in the above program works fine for sorting a List, but it does have one deficiency: it cannot be used to order a sorted collection (such as TreeSet(in the API reference documentation)) because it generates a strictly partial ordering. What this means is that this comparator equates unequal objects. In particular, any two employees who were hired on the same date will compare as equal. When you're sorting a List, this doesn't matter, but when you're using the Comparator to order a sorted collection, it's fatal. If you insert multiple employees who were hired on the same date into a TreeSet with this Comparator, only the first one will be added to the set. The second will be seen as a duplicate element, and ignored.
To fix this problem, all you have to do is tweak the Comparator so that it produces a total ordering. In other words, tweak it so that the only elements that are seen as equal when using compare are those that are also seen as equal when compared using equals. The way to do this is to do a two-part comparison (like we did for Name) where the first part is the one that we're actually interested in (in this case, the hire-date), and the second part is attribute that uniquely identifies the object. In this case, the employee number is the obvious attribute to use as the second part. Here's the Comparator that results:
static final Comparator SENIORITY_ORDER = new Comparator() {
public int compare(Object o1, Object o2) {
EmployeeRecord r1 = (EmployeeRecord) o1;
EmployeeRecord r2 = (EmployeeRecord) o2;
int dateCmp = r2.hireDate().compareTo(r1.hireDate());
if (dateCmp != 0)
return dateCmp;
return (r1.employeeNumber() < r2.employeeNumber() ? -1 :
(r1.employeeNumber() == r2.employeeNumber() ? 0 : 1));
}
};
One last note. You might be tempted to replace the final return statement in the Comparator with the simpler:
return r1.employeeNumber() - r2.employeeNumber();
Don't do it unless you're absolutely sure that no one will ever have a negative employee number! This trick does not work in general, as the signed integer type is not big enough to represent the difference of two arbitrary signed integers. If i is a large positive integer and j is a large negative integer, i-j will overflow and return a negative integer. The resulting Comparator violates one of the four technical restrictions that we keep talking about (transitivity), and produces horrible, subtle bugs. This is not a purely theoretical concern; people get burned by it.

No comments:

Post a Comment