Thursday, 21 April 2011

Unicode character in java

Java uses the Unicode character set. Unicode is a two-byte character code set that has characters representing almost all characters in almost all human alphabets and writing systems around the world including English, Arabic, Chinese and more.
Unfortunately many operating systems and web browsers do not handle Unicode. For the most part Java will properly handle the input of non-Unicode characters. The first 128 characters in the Unicode character set are identical to the common ASCII character set. The second 128 characters are identical to the upper 128 characters of the ISO Latin-1 extended ASCII character set. It's the next 65,280 characters that present problems.
You can refer to a particular Unicode character by using the escape sequence \u followed by a four digit hexadecimal number. For example
\u00A9 The copyright symbol
\u0022 " The double quote
\u00BD The fraction 1/2
\u0394 Δ The capital Greek letter delta
\u00F8 A little o with a slash through it
You can even use the full Unicode character sequence to name your variables. However chances are your text editor doesn't handle more than basic ASCII very well. You can use Unicode escape sequences instead like this:


String Mj\u00F8lner = "Hammer of Thor";

 but frankly this is way more trouble than it's worth.

No comments:

Post a Comment