Monday, 25 April 2011

Java string format tutorial

The String.format() method provides versatile formatting capabilities. This tutorial tries to present these capabilities in a accessible manner.

  1. The format string
    A format string can contain zero, one, or more format specifiers. The general form of a format specifier is:
    %[argument_index$][flags][width][.precision]conversion

    where things in square brackets are optional, and conversion is a character indicating the conversion to be applied to the corresponding variable value. The only required characters in the format specifier is the percent sign % and the conversion character.

    A simple example:
    public static void simpleFormat() {
    System.out.println(
    String.format("Hi %s, you owe me $%5.2f.", "Jack", 25.)
    );
    }

  2. The Argument index
    The argument index is specified by a number, terminated by the dollar sign $. The same argument may be repeated multiple times in a format string. Unindexed format specifiers take argument values from the argument list by the order they appear in the format string, i.e., the first format specifier takes the value of the first argument, the second format specifier takes the value of the second argument, so on. The only exceptions are "%n" and "%%", which do not consume an argument index. The former prints out a platform-specific line separator (i.e., "\r\n" for Widnows); the latter prints out a literal percent sign %. If the number of unindexed format specifiers exceeds the number of available arguments, a MissingFormatArgumentException is thrown.

    Example:
    public static void testArgumentIndex() {
    System.out.println(
    String.format(
    "A number may be formatted as a string \"%1$s\" "
    + "or a number %1$d",
    10)
    );

    // note the %n format specifier. It starts a new line
    // but does not consume an argument index from the list
    System.out.println(
    String.format(
    "%nMixing indexed and unindexed arguments: "
    + "%n%5$s %s %2$s %s %4$s %s %s",
    "one", "two", "three", "four", "five")
    );
    }

  3. Format as string, boolean, hex
    Any object can be formatted as string, boolean or hex. The conversion codes are s for string, b for boolean, h for hex. The uppercase variants (S, B, H) convert the string to uppercase.

    If the argument is null, the print out is "null" for string or hex conversion, "false" for boolean conversion. If the argument is not null, toString() is called for string conversion, Integer.toHexString(arg.hashCode()) is called for hex conversion; for boolean conversion, if the argument is boolean, then String.valueOf() is called, otherwise the result is "true".

    Example:
    public static void testGeneral() {
    System.out.println(
    // any object can be formatted as string
    // upper case S converts string to upper case
    String.format("%s, %S, %S, %S, %s", "String", "String", null, (byte) 1, 5.6)
    );

    System.out.println(
    // any object can be formatted as boolean
    // upper case B prints TRUE or FALSE
    String.format("%b, %B, %b, %B", "String", null, (byte) 1, 5.6)
    );

    System.out.println(
    // any object can be formatted as hex
    // upper case H prints hex in uppercase
    String.format("%h, %H, %H, %h, %H", "161", null, 161, new Integer(161), 5.6)
    );

    // What's the effect of width.precision on String?
    System.out.println(
    String.format("\"%1$s\", \"%1$14s\", \"%1$14.2s\"", "Hello")
    );
    }

  4. Format as character
    Only char, byte, short, int and their corresponding Java object types can be formatted as character. The conversion character is c (C for uppercase, only available since JDK 1.6).

    Example:
    public static void testCharacter() {
    System.out.println(
    String.format("'%1$s', '%1$c', '%1$C'", 97)
    );
    }

  5. Format as integer
    Only byte, short, int, long, their corresponding Java object types, and BigInteger can be formatted as integer. The applicable conversion characters are d for decimal, o for octal and x for hex. X prints hex code in uppercase.

    Example:
    public static void testInteger() {
    System.out.println(
    String.format("%d, %o, %h, %H", 161, 161, 161, 161)
    );


    // try big number with and without group separator
    System.out.println(
    String.format("%1$d, %1$,d", 161161161161L)
    );
    }

  6. Format as float
    Only float, double, their corresponding Java object types and BigDecimal can be formatted as float. The applicable conversion characters are e for scientific notation, f for decimal floating point, g for scientific or decimal (depending on precision and value after rounding), and a for hex floating-point number. E, G and A converts to upper case.

    Example:
    public static void testFloat() {
    System.out.println(
    String.format("%1$.2e, %1$.2f, %1$.2g, %1$.2a", 12345678.9999932)
    );
    }

  7. Format as date/time
    These types can be formatted as date/time: long, Long, java.util.Calendar and java.util.Date. The conversion character is t or T, where T is the uppercase variant. The date/time conversion character takes a conversion suffix to determine how to convert the argument to string.

    Example:
    public static void testDate() {
    long currentTime = System.currentTimeMillis();
    java.util.Date date = new java.util.Date(currentTime);

    System.out.println(
    String.format("Current Time: %1$tm/%1$td/%1$tY %1$tH:%1$tM:%1$tS", currentTime)
    );

    // same as above but using shorthand notation
    System.out.println(
    String.format("Current Time (using composition suffix): %1$tD %1$tT", currentTime)
    );

    System.out.println(
    String.format("Current Time (using Date object): %1$tm/%1$td/%1$tY %1$tH:%1$tM:%1$tS", date)
    );
    }


    Date conversion suffixes
    CharacterMeaning
    'B' Locale-specific full month name, e.g. "January", "February".
    'b' Locale-specific abbreviated month name, e.g. "Jan", "Feb".
    'h' Same as 'b'.
    'A' Locale-specific full name of the day of the week, e.g. "Sunday", "Monday"
    'a' Locale-specific short name of the day of the week, e.g. "Sun", "Mon"
    'C' Four-digit year divided by 100, formatted as two digits with leading zero as necessary, i.e. 00 - 99
    'Y' Year, formatted as at least four digits with leading zeros as necessary, e.g. 0092 equals 92 CE for the Gregorian calendar.
    'y' Last two digits of the year, formatted with leading zeros as necessary, i.e. 00 - 99.
    'j' Day of year, formatted as three digits with leading zeros as necessary, e.g. 001 - 366 for the Gregorian calendar.
    'm' Month, formatted as two digits with leading zeros as necessary, i.e. 01 - 13.
    'd' Day of month, formatted as two digits with leading zeros as necessary, i.e. 01 - 31
    'e' Day of month, formatted as two digits, i.e. 1 - 31.


    Time conversion suffixes
    CharacterMeaning
    'H' Hour of the day for the 24-hour clock, formatted as two digits with a leading zero as necessary i.e. 00 - 23.
    'I' Hour for the 12-hour clock, formatted as two digits with a leading zero as necessary, i.e. 01 - 12.
    'k' Hour of the day for the 24-hour clock, i.e. 0 - 23.
    'l' Hour for the 12-hour clock, i.e. 1 - 12.
    'M' Minute within the hour formatted as two digits with a leading zero as necessary, i.e. 00 - 59.
    'S' Seconds within the minute, formatted as two digits with a leading zero as necessary, i.e. 00 - 60 ("60" is a special value required to support leap seconds).
    'L' Millisecond within the second formatted as three digits with leading zeros as necessary, i.e. 000 - 999.
    'N' Nanosecond within the second, formatted as nine digits with leading zeros as necessary, i.e. 000000000 - 999999999.
    'p' Locale-specific morning or afternoon marker in lower case, e.g."am" or "pm". Use of the conversion prefix 'T' forces this output to upper case.
    'z' RFC 822 style numeric time zone offset from GMT, e.g. -0800.
    'Z' A string representing the abbreviation for the time zone. The Formatter's locale will supersede the locale of the argument (if any).
    's' Seconds since the beginning of the epoch starting at 1 January 1970 00:00:00 UTC, i.e. Long.MIN_VALUE/1000 to Long.MAX_VALUE/1000.
    'Q' Milliseconds since the beginning of the epoch starting at 1 January 1970 00:00:00 UTC, i.e. Long.MIN_VALUE to Long.MAX_VALUE.


    Date/Time composite suffixes
    CharacterMeaning
    'R' Time formatted for the 24-hour clock as "%tH:%tM"
    'T' Time formatted for the 24-hour clock as "%tH:%tM:%tS".
    'r' Time formatted for the 12-hour clock as "%tI:%tM:%tS %Tp". The location of the morning or afternoon marker ('%Tp') may be locale-dependent.
    'D' Date formatted as "%tm/%td/%ty".
    'F' ISO 8601 complete date formatted as "%tY-%tm-%td".
    'c' Date and time formatted as "%ta %tb %td %tT %tZ %tY", e.g. "Sun Jul 20 16:17:00 EDT 1969".

  8. The flags

    FlagMeaning
    '-'The result will be left-justified. Default is right-justify, so there's no flag for right-justify.
    '#'The result should use a conversion-dependent alternate form.
    '+'The result will always include a sign (for numbers).
    ' 'The result will include a leading space for positive values (for numbers).
    '0'The result will be zero-padded (for numbers).
    ','The result will include locale-specific grouping separators (for numbers).
    '('The result will enclose negative numbers in parentheses.


    Example:
    public static void testFlags() {
    System.out.println(
    String.format("'%1$s', '%1$#s', '%1$-10.8s', '%1$.12s', '%1$-25s'", "Huge Fruit, Inc.")
    );
    }

No comments:

Post a Comment