by Michael S. Kaplan, published on 2008/03/23 10:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2008/03/23/8331877.aspx

A few years back, I was chatting with Thomas Milo of DecoType, I believe at an Internationalization and Unicode Conference.

Now Tom has rather unique background in language, and our conversation was significantly enhanced by that fact.

We were talking in particular about Uyghur (of Uighur or Uyghur? fame). I was in fact talking about the whole Uighur vs. Uyghur thing, and he pointed out a more interesting issue that was kind of buried and not discussed.

The text about the language and its script history has been quoted countless times (including in my Uighur or Uyghur? blog above):

Uyghur was originally written with the Orkhon alphabet, a runiform script derived from or inspired by the Sogdian script, which was ultimately derived from the Aramaic script.

From the the 16th century until the early 20th century, Uyghur was written with a version of the Arabic alphabet known as 'Chagatai'. During the 20th century a number of versions of the Latin and Cyrillic alphabets were adopted to write Uyghur in different Uyghur-speaking regions. However the Latin alphabet was unpopular and in 1987 the Arabic script was reinstated as the official script for Uyghur in China.

The text kind of implies that from the 16th century to today it was a simple Arabic-Latin-Arabic script move, and the word reinstated clearly implies the software development equivalent of treating the Latin change as a bad checkin that was reverted. But Tom was pretty insistent that the move was not a reversion -- that the Arabic script Uyghur from prior centuries was not the same as the one used today, the one that was "reinstated". His essential claim in the conversation was that the new Arabic script Uyghur had something of a Latin script Uighur orthographic overlay/influence.

This fact is perhaps implied in the way that this text talks about the 16th-early 20th century as using Chagatai while the later one does not. So the differences are kind of hidden in plain site -- a Poe-ian Purloined Alphabet kind of move.

The difference? Well, they added additional diacritics to better differentiate Uyghur vowels, something that the Latin script Uighur was already kind of doing. The argument that this changes orthography is an interesting one -- clearly it does, though without actually contrasts it is quite difficult to compare. The contrasting of some (though not all) of these different forms is something I mentioned and showed the table in Will the real orthography please rise? (HINT: they should all stand up), but notice that you do not see two different Arabic script columns there or in articles like the Uyghur Ereb Yéziqi one in Wikipedia.

In the end, the new diacritics reportedly add some of the notions of vowel harmony that exist in Turkic languages as expressed in the Latin script to the written Arabic script for Uyghur (by contrast, the Turkish language has vowel harmony even though not all of the vowel differentiation is seen in the Arabic script).

At least for historical reasons one has to understand the differences to read older texts properly. And I have no problem finding Old English and Middle English differences, for example, if I am trying to understand those texts. This lack of side-by-side comparison information between the old and new Arabic script orthography for Uyghur seems to be mostly a political issue -- a way for this to not look like script reform or language reform. But I wonder whether the lack of real information and examples is really such a good thing here.

After all, without the information, how can the differences that need to be captured in fonts or keyboards or collation be fully understood? Perhaps the older orthography is a 100% subset and supporting the new orthography means supporting the old ones in terms of coverage of the alphabet for those purposes. But that would be much easier to evaluate if the information was made more easily available....


Thomas Milo on 14 Jun 2008 3:18 AM:

Modern Uygur (transliterating from modern pseudo-Arabic Uighur) is a one-to-one transliteration of the latin orthoraphy, including the hyphenation. For instance, the christian name Mohammed is written "mohammed" with 8 letters using a laryngeal /h/, whereas the traditional spelling is mHmd with four letters using the letter for pharyngeal /H/.


Michael S. Kaplan on 14 Jun 2008 3:29 AM:

So why is that the older Uyghur orthography is so hard to get information about?

