What Is XML Encoding?
Computers encode the characters in saved files in a number of possible ways. This encoding refers to how the characters are translated to and from the numerical sequences which are used to represent them within the computer system itself. XML encoding refers to the encoding used for saving the data in an XML document. As a user of XML, you won't generally need to worry about encoding, but it can occasionally cause errors.
-
ASCII
-
American Standard Code for Information Interchange is one of the major and most commonly used encoding systems within computing. However, XML data can contain characters that aren't within the ASCII set. Many text editors save files using ASCII by default, which means that XML files containing non-ASCII characters, but saved with an ASCII encoding method, may cause errors. XML is designed to be extremely flexible and to be used in lots of different contexts, which is partly why it allows a wide range of characters.
Unicode
-
Unicode is a standard for character encoding that allows a wider range of characters than ASCII, including character sets for many different languages. While ASCII is broadly based on the English language, Unicode aims to support many languages, alphabets and characters. Unicode therefore provides adequate encoding for XML files. There are a number of distinct encoding categories within Unicode, determining the amount of computer memory that's allocated to characters and consequently how many can be used.
-
Errors
-
The most common errors caused by incorrectly encoded XML are those relating to linguistic characters such as accented letters used in languages like French, or letters that are not part of the Latin alphabet, such as those used in Arabic. Where these errors occur, the solution in most cases is to alter the encoding used. To do this, it's normally necessary to save an XML file again, with different encoding settings enforced, and optionally to include the encoding attribute.
Encoding Attribute
-
XML data can use an encoding attribute in the XML element. The following sample XML shows double-byte Unicode encoding being asserted at the start of an XML document: <?xml version="1.0" encoding="UTF-16"?>. The encoding attribute can indicate many different encoding standards, including single-byte Unicode and ISO. If XML data contains characters in alphabets other than the Latin one, or uses characters with accents, it's generally advisable to use UTF-16.
Text Editors
-
XML documents can be created, viewed and edited in most standard text editor applications. Additionally, there are various editors specifically designed for XML. Some text editing programs, such as Notepad for Windows operating systems, automatically save files using ASCII, which will cause problems for certain XML documents. In such cases, developers can choose Unicode within the "Save as" menu option, and include the XML encoding attribute. The encoding attribute and saving options must reflect the same encoding standard to prevent errors when using the XML data.
-
References
Resources
- Photo Credit Medioimages/Photodisc/Photodisc/Getty Images