Saturday, 12 March 2011

Multimaps

A multimap is like a map, except it can map each key to multiple values. The Collections Framework doesn't include an interface for multimaps, because they aren't used all that commonly. It's a fairly simple matter to use a Map whose values are List objects as a multimap. This technique is demonstrated in the next code example, which reads a dictionary containing one word per line (all lowercase) and prints out all of the permutation groups in the dictionary that meet a size criterion. A permutation group is a bunch of words all of which contain exactly the same letters, but in a different order. The program takes two arguments on the command line: the name of the dictionary file, and the minimum size of permutation group to print out. Permutation groups containing fewer words than the specified minimum are not printed.
There is a standard trick for finding permutation groups: for each word in the dictionary, alphabetize the letters in the word (that is, reorder the word's letters into alphabetical order) and put an entry into a multimap, mapping the alphabetized word to the original word. For example, the word "bad" causes an entry mapping "abd" into "bad" to be put into the multimap. A moment's reflection will show that all of the words to which any given key maps form a permutation group. It's a simple matter to iterate over the keys in the multimap, printing out each permutation group that meets the size constraint.
The following program is a straightforward implementation of this technique. The only tricky part is the alphabetize method, which returns a string containing the same characters as its argument, in alphabetical order. This routine (which has nothing to do with the Collections Framework) implements a slick bucket sort. It assumes that the word being alphabetized consists entirely of lowercase alphabetic characters.
import java.util.*;
import java.io.*;

public class Perm {
public static void main(String[] args) {
int minGroupSize = Integer.parseInt(args[1]);

// Read words from file and put into simulated multimap
Map m = new HashMap();
try {
BufferedReader in =
new BufferedReader(new FileReader(args[0]));
String word;
while((word = in.readLine()) != null) {
String alpha = alphabetize(word);
List l = (List) m.get(alpha);
if (l==null)
m.put(alpha, l=new ArrayList());
l.add(word);
}
} catch(IOException e) {
System.err.println(e);
System.exit(1);
}

// Print all permutation groups above size threshold
for (Iterator i = m.values().iterator(); i.hasNext(); ) {
List l = (List) i.next();
if (l.size() >= minGroupSize)
System.out.println(l.size() + ": " + l);
}
}

private static String alphabetize(String s) {
int count[] = new int[256];
int len = s.length();
for (int i=0; i<len; i++)
count[s.charAt(i)]++;
StringBuffer result = new StringBuffer(len);
for (char c='a'; c<='z'; c++)
for (int i=0; i<count[c]; i++)
result.append(c);
return result.toString();
}
}
Running the program on an 80,000 word dictionary takes about 16 seconds on an aging UltraSparc 1. With a minimum permutation group size of eight, it produces the following output:
% java Perm dictionary.txt 8

9: [estrin, inerts, insert, inters, niters, nitres, sinter,
triens, trines]
8: [carets, cartes, caster, caters, crates, reacts, recast, traces]
9: [capers, crapes, escarp, pacers, parsec, recaps, scrape, secpar,
spacer]
8: [ates, east, eats, etas, sate, seat, seta, teas]
12: [apers, apres, asper, pares, parse, pears, prase, presa, rapes,
reaps, spare, spear]
9: [anestri, antsier, nastier, ratines, retains, retinas, retsina,
stainer, stearin]
10: [least, setal, slate, stale, steal, stela, taels, tales, teals,
tesla]
8: [arles, earls, lares, laser, lears, rales, reals, seral]
8: [lapse, leaps, pales, peals, pleas, salep, sepal, spale]
8: [aspers, parses, passer, prases, repass, spares, sparse, spears]
8: [earings, erasing, gainers, reagins, regains, reginas, searing,
seringa]
11: [alerts, alters, artels, estral, laster, ratels, salter, slater,
staler, stelar, talers]
9: [palest, palets, pastel, petals, plates, pleats, septal, staple,
tepals]
8: [enters, nester, renest, rentes, resent, tenser, ternes, treens]
8: [peris, piers, pries, prise, ripes, speir, spier, spire]
Many of these words seem a bit bogus, but that's not the program's fault; they're in the dictionary file.

No comments:

Post a Comment