|
|
|
One of the fundamental challenges to software localization and Web site globalization is the chaotic character of character encoding. A graphic artist once gave me an explanation for the chaos that pre-dates computers. In the beginning, she said, lettering and then typefaces were the province of artists gifted in the art of calligraphy and jealous of the unique character of their characters. Craft, artistry and originality counted, not replication, which requires organization, classification, standardization, and communication and adoption of the standard. Every technological innovation since the printing press has encouraged the latter. Some atavistic chaos endures.
Dr. Peter L. Noerr, Technical Director, Information Management and Engineering Ltd. offers a succinct and understandable overviews of character set development. In his history of character encoding, Noerr describes how characters were first encoded in 4-, 6-, and 7-bits and the standardization of this encoding into ASCII (American Standard Code for Information Interchange), a 128-character set. The advent of 8-bit encoding doubled the number of available characters in the set to 256, now termed extended ASCII or ANSI (American National Standards Institute). It is interesting, and germane, to note the word "American" figures prominently in the names of these standards. Beyond certain accents or a few non-English characters, these standards could not easily accommodate languages other than English. Refinements to extended ASCII made it possible to "customize" the set to accommodate other languages. Even with refinements, however, extended ASCII has its limitations when it comes to data transfer. Duplicative encoding, i.e., assigning the same encoding to two different characters or two encodings to the same character, can wreak havoc in a multilingual setting. For example, the encoding of a roman-language character set and a Cyrillic language family character set may "overlap," creating conflicts. This drawback limits ANSI to a single-language environment and makes such transfer problematic. Beyond the transfer issue, documents that contain more than one language can be a veritable nightmare to manage. ANSI has other limitations. Just in mathematical terms, a 256-character set cannot accommodate ideographic languages like Chinese that contain thousands of discrete "characters." 16-bit, or double-byte encoding which expanded capacity, offered only a partial answer to the computational challenge of ideographic languages. Again, just in mathematical terms, multilingual computing would require numerous character sets. The next step in the evolution of encoding is the UNICODE standard. UNICODE offers a solution to the duplicative encoding dilemma and the need for multiple character sets, by assigning one unique code number to each discrete character. The beauty of UNICODE is that it is not language-dependent. Go To Page: 1 2
The copyright of the article An International Cast of Characters in Export Marketing is owned by . Permission to republish An International Cast of Characters in print or online must be granted by the author in writing.
|
|
|
|
|
|
|
|