How to Detect Multibyte Characters in Java
When working with Java Strings that contain multibyte characters, it's necessary to use functions that support multibyte characters and to load an appropriate character set to display them. Multibyte encodings, such as UTF-8, are helpful when you need to store characters in non-English languages, such as Japanese and Chinese, that the standard character set doesn't support. The Java String and Character classes may contain multibyte characters, but there is no simple built-in function to inform you if they do or not. To detect multibyte characters, you have to manually loop through each character in the String and check if any characters contain more than a single byte.
Instructions
-
-
1
Open the Java file with an editor such as Netbeans, Eclipse or JBuilder X.
-
2
Declare the variables necessary to detect multibyte characters by adding the following code at the top of your function:
char[] c_array;
String c_string;
byte[] c_byte_array;
boolean result;
String str;
-
-
3
Initialize the "str" variable with a text value by adding the following code in your function:
str = "sample string text";
-
4
Loop through each character and check if it's multibyte by adding the following code in your function:
c_array = str.toCharArray();
result = false;
for (char c : c_array)
{
c_string = Character.toString(c);
c_byte_array = c_string.getBytes("UTF-8");
if (c_byte_array.length > 1)
{
System.out.println("Detected a multibyte character.");
result = true;
break;
}
}
if (!result)
System.out.println("Didn't detect any multibyte characters.");
The loop converts each character into a String. It then converts the String into a byte array. Finally, it checks the length of the byte array. A length greater than one indicates a multibyte character present in the String.
-
5
Save the Java file, compile and run your program to search a String for multibyte characters.
-
1