Friday, December 11, 2009

Reading UTF-8 streams

If you have a byte array with Unicode characters encoded with UTF-8 you can create a String from it with the following constructors:

String(byte[] bytes, int off, int len, String enc) 
String(byte[] bytes, String enc)


This is useful when reading from local files. But what can you use if you are reading an UTF-8 stream?

You can use class java.io.InputStreamReader with the following constructor:

InputStreamReader(InputStream is, String enc)


After you have the InputStreamReader instance you can create a char array buffer and use the following method:

public int read(char[] cbuf, int off, int len)


For example:

InputStreamReader in = new InputStreamReader(
inputConnection.openInputStream(), "UTF-8");
char [] buff = new char[1024];
int len = in.read(buff, 0, buff.length);

while (len > 0) {
// use buff characters, like
// String s = new String(buff, 0, len)

len = in.read(buff, 0, buff.length);
}


But I have used only in.read(buff) and did not have a problem with it.

This also applies for Java Standard and Enterprise editions.

No comments: