by Michael S. Kaplan, published on 2012/06/11 07:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2012/06/11/10314191.aspx
GB 18030 is an important standard in the People's Republic of China.
Important because not complying with the provisions in it when you ship a version of your software means you can't ship the software in China....
Now the latest provisions usually refer to minority language support, like support of Tai Le.
Or New Tai Leu.
Or Mongolian.
Or Tibetan.
Or Yi.
Or Uighur.
And when they come up with requirements of those languages, they don't do it in a vacuum -- they talk to experts of those languages so that when they say what it means to support a language they are able to do with specific details.
In the case of Uighur, one of the issues communicated was that some of the characters in the Arabic Presentation Forms A block in Unicode that were not in the font had to be supported:
Code point |
Character |
Decomposition |
Name` |
U+fbea |
ﯪ |
<isolated> 0626 0627 |
ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH ALEF ISOLATED FORM |
U+fbeb |
ﯫ |
<final> 0626 0627 |
ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH ALEF FINAL FORM |
U+fbec |
ﯬ |
<isolated> 0626 06D5 |
ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH AE ISOLATED FORM |
U+fbed |
ﯭ |
<final> 0626 06D5 |
ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH AE FINAL FORM |
U+fbee |
ﯮ |
<isolated> 0626 0648 |
ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH WAW ISOLATED FORM |
U+fbef |
ﯯ |
<final> 0626 0648 |
ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH WAW FINAL FORM |
U+fbf0 |
ﯰ |
<isolated> 0626 06C7 |
ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH U ISOLATED FORM |
U+fbf1 |
ﯱ |
<final> 0626 06C7 |
ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH U FINAL FORM |
U+fbf2 |
ﯲ |
<isolated> 0626 06C6 |
ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH OE ISOLATED FORM |
U+fbf3 |
ﯳ |
<final> 0626 06C6 |
ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH OE FINAL FORM |
U+fbf4 |
ﯴ |
<isolated> 0626 06C8 |
ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH YU ISOLATED FORM |
U+fbf5 |
ﯵ |
<final> 0626 06C8 |
ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH YU FINAL FORM |
U+fbf6 |
ﯶ |
<isolated> 0626 06D0 |
ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH E ISOLATED FORM |
U+fbf7 |
ﯷ |
<final> 0626 06D0 |
ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH E FINAL FORM |
U+fbf8 |
ﯸ |
<initial> 0626 06D0 |
ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH E INITIAL FORM |
U+fbf9 |
ﯹ |
<isolated> 0626 0649 |
ARABIC LIGATURE UIGHUR KIRGHIZ YEH WITH HAMZA ABOVE WITH ALEF MAKSURA ISOLATED FORM |
U+fbfa |
ﯺ |
<final> 0626 0649 |
ARABIC LIGATURE UIGHUR KIRGHIZ YEH WITH HAMZA ABOVE WITH ALEF MAKSURA FINAL FORM |
U+fbfb |
ﯻ |
<initial> 0626 0649 |
ARABIC LIGATURE UIGHUR KIRGHIZ YEH WITH HAMZA ABOVE WITH ALEF MAKSURA INITIAL FORM |
Now maybe you wonder why this is important.
Perhaps some implementations don't do the shaoing correctly.
Perhaps some of the tests they do have to succeed even if no shaping is done at all.
I'm not sure, exactly.
But we can certainly do this -- the harder part with the shaping and the Unicode support is there, so a few compatibility characters? That's easy.
In fact, it's done, even though you can't see it just yet for all the characters. It will be there soon enough.
We want to do right by the language.
And the script.
And China....