Wednesday, 22 June 2011

Beware of String functions - they may use simple strings as regex

Some string function may look simple and perform task accordingly but surprise you sometimes. eg. consider the Split function:

public class LineParser {
private final String[] values;

public LineParser(String line, String separator) {
values = line.split(separator);
}

public String getValue(int index) {
return values[index];
}
}

It’s a simple class that encapsulates parsing a text line and stores the result. Let's see.
public static void main(String[] args) {
LineParser parser1 = new LineParser("A,B,C", ",");
System.out.println("parser1:" + parser1.getValue(1));

LineParser parser2 = new LineParser("A B C", " ");
System.out.println("parser2:" + parser2.getValue(1));

LineParser parser3 = new LineParser("A|B|C", "|");
System.out.println("parser3:" + parser3.getValue(1));

LineParser parser4 = new LineParser("A\\B\\C", "\\");
System.out.println("parser4:" + parser4.getValue(1));
}

Output
For the first and second parser there is no surprise: the second value is ‘B’ and that’s exactly what gets printed. The third one instead of a second value prints ‘A’ – the first one… If that’s not strange enough the last parser throws an exception! That’s really unexpected!!

So where’s the catch? What’s wrong? Some of you already knew it, some probably start to suspect it… It’s all because of String.split() method – instead of taking a separator String as a parameter (which I tried to silently imply in the code) it takes a regular expression. Because of that two last parsers failed – both pipe and backslash signs have special meaning in Java regexps!
Mystery solved, so problem is gone… is it really? Of course you might be tempted just to fix the snippet above by writing the regexps correctly – this would be fine for this code. Now go home and check your code: do you use user-provided values in String.split()? What about String.replaceAll()? If you do you might be in real trouble… The real lesson is that some of the String methods take as a parameter plain Strings (eg: String.regionMatches()) while other expect a String with a regular expression (eg: String.matches()). Beware and double check!


No comments:

Post a Comment