You call it GB18030, I call it UTF-GBK...

by Michael S. Kaplan, published on 2013/03/28 07:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2013/03/28/10405914.aspx


'Michael, how do I support GB 18030 ?!?'

This is a problem that has been played over email messages to me many times since 2005 (when GB18030 was first released).

Someone (in this case a developer named Carla) has been given the assignment to support the standard, due to plans to start shipping to China.

The name is easy. It starts with GB. GB stands for Guobiao (simplified Chinese: 国标; traditional Chinese: 國標; pinyin: Guóbiāo), Chinese for national standard.

You can see a bunch of them in the GB standards Wikipedia page.

There are many more than this, believe me.

Some of them are simply Chinese translations with no substantive changes of standards created elsewhere that China wants people using.

GB18030 is not one of those standards.

You can see a pretty good description of it in the Wikipedia article here. They cover many of the problems people found trying to work with it since 2005.

And there is now a formal process that allows China to review "GB18030 compliance".

This is important because officially, GB18030 is not defined as one that has to be completely followed -- only a subset is mandatory.

One does not have to assume that one must support every language Unicode does just because GB18030 maps to all of Unicode.

Because testing that is done centers on support of languages and scripts that are relevant, interesting, and/or important to China.

Currently, the tests principally focus on the way text is input, displayed, stored, sent, and retrieved for:

and so on. Microsoft has font support for most of them (and for all of the required ones), and keyboard support for most of them as well.

Now one really important aspect of how standards work in China is how they refer to each other.

So for example Uyghur support and its bidirectional requirements don't have to be spelled out in detail in GB18030; it can point to another standard that includes the information.

There isn't always uniform effort to make sure standards written before GB18030 was born explicitly spell out what GB18030 support means for it.

But they are not shy about explaining how it all has to work.

Ken Whistler once famously explained how things would be okay after China locked in 1-to-1 mappings between GB18080 and Unicode:

Think of it as UTF-GBK.

Which kind of sums it up. :-)

Mostly, you just need two things to support it:

And that's really it!

Now note that platform pieces do most of the work here; if you run on Windows (for example), you don't have to make every font and keyboard and rendering engine.

A lot of the time, we (and others) have you covered when you use our (or their) stuff.

You can just focus on the hardest part: make sure you are using it correctly!

Now for the rest -- the registration and such -- that's for the lawyers to jump on, not you....


no comments

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

go to newer or older post, or back to index or month or day