Uighur, GB1030, and Arabic compatibility characters

by Michael S. Kaplan, published on 2012/06/11 07:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2012/06/11/10314191.aspx


 GB 18030 is an important standard in the People's Republic of China.

Important because not complying with the provisions in it when you ship a version of your software means you can't ship the software in China....

Now the latest provisions usually refer to minority language support, like support of Tai Le.

Or New Tai Leu.

Or Mongolian.

Or Tibetan.

Or Yi.

Or Uighur.

And when they come up with requirements of those languages, they don't do it in a vacuum -- they talk to experts of those languages so that when they say what it means to support a language they are able to do with specific details.

In the case of Uighur, one of the issues communicated was that some of the characters in the Arabic Presentation Forms A block in Unicode that were not in the font had to be supported:

 

Code point

Character

Decomposition

Name`

U+fbea

<isolated> 0626 0627

ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH ALEF ISOLATED FORM

U+fbeb

<final> 0626 0627

ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH ALEF FINAL FORM

U+fbec

<isolated> 0626 06D5

ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH AE ISOLATED FORM

U+fbed

<final> 0626 06D5

ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH AE FINAL FORM

U+fbee

<isolated> 0626 0648

ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH WAW ISOLATED FORM

U+fbef

<final> 0626 0648

ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH WAW FINAL FORM

U+fbf0

<isolated> 0626 06C7

ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH U ISOLATED FORM

U+fbf1

<final> 0626 06C7

ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH U FINAL FORM

U+fbf2

<isolated> 0626 06C6

ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH OE ISOLATED FORM

U+fbf3

<final> 0626 06C6

ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH OE FINAL FORM

U+fbf4

<isolated> 0626 06C8

ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH YU ISOLATED FORM

U+fbf5

<final> 0626 06C8

ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH YU FINAL FORM

U+fbf6

<isolated> 0626 06D0

ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH E ISOLATED FORM

U+fbf7

<final> 0626 06D0

ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH E FINAL FORM

U+fbf8

<initial> 0626 06D0

ARABIC LIGATURE YEH WITH HAMZA ABOVE WITH E INITIAL FORM

U+fbf9

<isolated> 0626 0649

ARABIC LIGATURE UIGHUR KIRGHIZ YEH WITH HAMZA ABOVE WITH ALEF MAKSURA ISOLATED FORM

U+fbfa

<final> 0626 0649

ARABIC LIGATURE UIGHUR KIRGHIZ YEH WITH HAMZA ABOVE WITH ALEF MAKSURA FINAL FORM

U+fbfb

<initial> 0626 0649

ARABIC LIGATURE UIGHUR KIRGHIZ YEH WITH HAMZA ABOVE WITH ALEF MAKSURA INITIAL FORM

 

Now maybe you wonder why this is important.

Perhaps some implementations don't do the shaoing correctly.

Perhaps some of the tests they do have to succeed even if no shaping is done at all.

I'm not sure, exactly.

But we can certainly do this -- the harder part with the shaping and the Unicode support is there, so a few compatibility characters? That's easy.

In fact, it's done, even though you can't see it just yet for all the characters. It will be there soon enough.

We want to do right by the language.

And the script.

And China....


no comments

go to newer or older post, or back to index or month or day