Wednesday, 27 October 2010

Pattern and Matcher

In addition to the regular expression methods that are available in the String class (see String Regular Expressions), there are two classes that are specifically user for regular expression matching.

  • java.util.regex.Pattern precompiles regular expressions so they can be executed more efficiently. It also has a few utility functions. This same pattern can be reused by many Matcher objects.
        Pattern pat = Pattern.compile(regexString);

  • java.util.regex.Matcher objects are created from a Pattern object and a subject string to scan. This class provides a full set of methods to do the sacnning.
        Matcher m = pat.matcher(subject);

Common methods


The following variables represent elements of their declared type in the prototypes below.

import java.util.regex.*;
. . .
boolean b; // may be used in if statement.
int i; // index or character position.
int g; // group number
int n; // number of groups
CharSequence cs; // effectively either String or StringBuffer
String s; // Subject string.
String regex; // Regular expression string.
StringBuffer sb; // Used to build up string from repeated scans.
Pattern p = Pattern.compile(regex); // Compiles regular expression into Pattern.
Matcher m = p.matcher(s); // Creates Matcher with subject s and Pattern p.

Result
Method
Description

Creating a Pattern

p =
Pattern.compile(regex);
Creates Pattern based on regex. May throw PatternSyntaxException.

p =
Pattern.compile(regex, f);
As above. Flag f can be Pattern.CASE_INSENSITIVE, ....

Finding pattern matches

b =
m.find();
True if pattern can be found in the subject string. First call starts trying at beginning of string. Subsequent calls start after last character previously matched, which makes it good for a while loop.

b =
m.find(i);
True if pattern can match somewhere at or after position i.

b =
m.matches();
True if pattern matches entire subject string.

b =
Pattern.matches(regex, s);
As above, but less efficient if regex used more than once.

b =
m.lookingAt();
True if pattern matches starting at first char.

Getting results of last pattern match. Corresponds to group 0, then entire match.

s =
m.group();
String which was matched.

i =
m.start();
Index of first character of match.

i =
m.end();
Index of last character plus 1 of match.

Getting group results of last match

s =
m.group(g);
String which was matched by group g.

i =
m.start(g);
Index of first character of group g.

i =
m.end(g);
Index of last character plus 1 of group g.

n =
m.groupCount();
Number of groups that were matched.

Misc

m =
m.reset();
Resets Matcher m so that next find starts at beginning.

m =
m.reset(cs);
Resets Matcher m to match subject cs.

Replacing text

s =
m.replaceFirst(rep);
Returns string which is subject with first match replaced by rep.

s =
m.replaceAll(rep);
Returns string with all matches replaced by rep.

Building replacement in a StringBuffer

m =
m.appendReplacement(sb, s);
When it matches, (1) everything skipped before the match is appended to the StringBuffer sb, then (2) it appends s, with "$n" replaced by group n.

sb =
m.appendTail(sb);
Appends last part that m didn't match to StringBuffer. Useful after loop calling appendReplacement() to finish.

Other Resources


No comments:

Post a Comment