How to Detect a CJK Character in Java
Detecting a Chinese, Japanese and Korean, or CJK, character in Java is easy with the use of certain Java classes, such as InputStreamReader and OutputStreamWriter. These two classes translate into and out of Unicode from local encodings, including Big5 and GB2312 encodings. Chinese, Japanese and Korean languages are all from East Asia and they're classified as double-byte characters sets, or DBCS, languages, which means that each of these languages will require an extensive amount of bits to represent their double-byte characters.
Instructions
-
-
1
Open the string of Java code that you'll be validating for presence of CJK characters. Make sure that you loaded the required Java classes to your Java application. Use the Java Swing application to make sure that you have these classes installed.
-
2
Use the following code on your Java string to make it return true if the String "s" contains Chinese characters:
public boolean containsChinese(String s) {
for (int i=0; i<s.length(); i++) {
if (isJapanese(s.charAt(i)) {
return true;
}
}
return false;
}
-
-
3
Insert the following code in your Java string to make it returns true if the char "c" is a double-byte character :
public boolean isJapanese(char c) {
if (c >= '\u0100' && c<='\uffff') return true;
return false;
// simpler: return c>'\u00ff';
}
-
4
Use the code below to make your application return true if the String "s" contains any double-byte or CJK characters:
public boolean containsDoubleByte(String s) {
for (int i=0;i<s.length(); i++) {
if (isDoubleByte(s.charAt(i)) {
return true;
}
}
return false;
}
-
1
Tips & Warnings
There are many tutorials for Java applications that you can try for free. Join various discussion forums to seek advice and guidance from experienced Java users.
To display the characters of your target language properly, make sure that your browser has the required fonts for it. Your browser must be HTML 4.0-compliant and support Basic Multilingual Plane, which is the standardized 16-bit character set that recognizes most of the languages actively used in the world.
References
Resources
- Photo Credit Polka Dot RF/Polka Dot/Getty Images