Sunday, 13 March 2011

How to return the content in the correct encoding from a servlet?

The doGet() method of a servlet usually returns content to the requester.  The most common way to do so is to get a java.io.PrintWriter from the HttpServletResponse, then use the write() method to write the content of the PrintWriter.
Example I
protected void doGet(HttpServletRequest req, HttpServletResponse resp)
throws ServletException, IOException {

String xml = "some xml";
resp.setContentType("text/xml");
PrintWriter out = resp.getWriter();
out.write(xml);
}
When the content contains some non-ascii characters, you need to make sure the they are in the correct encoding, otherwise the requester will get non displayable characters. There are two ways to do it. The first one is to call the setCharacterEncoding(encodingOfYourChoice) of the HttpServletResponse.
Example II:
protected void doGet(HttpServletRequest req, HttpServletResponse resp)
throws ServletException, IOException {

String xml = "some xml";
resp.setCharacterEncoding("UTF-8");
resp.setContentType("text/xml");
PrintWriter out = resp.getWriter();
out.write(xml);
}
The other way is to add the charset property in the content type:

Example III
protected void doGet(HttpServletRequest req, HttpServletResponse resp)
throws ServletException, IOException {

String xml = "some xml";
resp.setContentType("text/html; charset=UTF-8");
PrintWriter out = resp.getWriter();
out.write(xml);
}
Note, both calls in example II and III need to be called before the getWriter() method. Once the getWriter() method is called, the charset can't be changed. When setting the contentType without the charset property, the OS's default charset will be used, such as ISO-8859-4.

No comments:

Post a Comment