More on 'repetoire fences'

by Michael S. Kaplan, published on 2005/05/10 17:31 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/05/10/416129.aspx


In my earlier post about Raymond's Chinese dictionary, I mentioned how in some cases code pages can be treated not so much an encoding technique as a 'repetoire fence'.

The idea is not all that novel. In this age where most major software products are inherently Unicode, it is quite common to look at latter-day encodings like the Hong Kong Supplemental Character Set (HKSCS) or the latest version of JIS (JIS X 213) as a list of characters (and sometimes even preferred appearances for those characters) to use within a language or set of languages.

Obviously as a technique this is not ideal since code pages, especially single-byte ones, often do not contain all of the characters that are needed. But it is hard to have a model whereby a language publishes a standard set of characterd other than a code page, so sometimes it is the only way to develop a starting point. Perhaps you could use the errata of the code page where they list missing items as a way to extend the set further? :-)

It is an interesting model, and one with use only when there is such a code page available.

Another time that code pages have been a blessing (or curse, depending on how you look at it) is when a script is first encoded into Unicode. Certainly the various Thai and Indic "ISCII" code pages had a lot to do with how Unicode moved to encode these scripts. But some people would claim that this convenient shortcut can often work against the interests of a language. This has to do with at least the following three issues:

  1. Somtimes it is not the language owners who are the creators of the code page
  2. The encoding model of the code page may not be consistent with Unicode's model
  3. Occasionally the code page comes with political issues that negatively impact perception of Unicode for using it

In these cases, in retrospect people wonder if they should have just worked without using the code page, after everything is done. At this point the best thing to do is to de-emphasize the code page and the fact that it was used, and focused on the positive aspects of the encoding....


no comments

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

referenced by

2011/01/13 Doing it for appearances, Hong Kong style!

2008/11/19 Apocalypse Font (aka Guess they must have picked the wrong eight characters.)

2007/02/28 What do they mean when they say 'GB18030 Characters' ?

2007/02/24 Using a character proposal for a 'repertoire fence' extension

2007/01/19 For HKSCS, Vista ≫ 2004, and 2004 ≫ 2001

2006/07/05 Custom code pages?

2005/12/07 Some sorts resist the future

go to newer or older post, or back to index or month or day