by Michael S. Kaplan, published on 2011/02/16 07:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2011/02/16/10130119.aspx
I had someone ask me
I don’t have a big picture on the whole process of enabling a new language, even though I know a little here or there. For my learning purpose, do you have something that I can read to teach myself?
Now there really isn't an explicit single place that I know of where such info is kept, so I took a similar item on my "blog request list" and promoted it to right now so you can read the response right here. :-)
Now these steps are going to be described in a narrative since that is how my blogs often work, but the actual process is done by different people and the order often reflects the built-in multitasking that any multi-person project can bring to the mix. So don't think of this blog as providing an ordered recipe or directions.
Here we go!
STEP ONE: The Reading
The most basic level of enablement is the display of text, which means a font that has the glyphs for the language's letters in it.
I wouldn't really claim that the language was fully enabled or anything, but if I can read documents in it when I explicitly choose that font then it is a good first step.
And this step enables the reading, which is great. But the next step is another crucial one on the way to full enablement:
STEP TWO: The Writing
Put simply, there needs to be a keyboard or an IME.
Other methods exist like handwriting recognition and speech recognition, but those tend to show up much later in the lifetime of as language's support in computers. So for present purposes we can assume a keyboard or an IME.
The quality of the input method is one that I would usually make on a more global basis (since if it is a part of Windows it is available for use to a frightening number of people), but for the purposes of this blog on language enablement, I'll say that the perceived quality of an input method to an individual customer is directly proportional to how easy it is that they find it to use.
I'll get into that issue further another day, I just mention it here so people can keep in mind how much the fundamental process of language enablement is sabatoged at its root if any of these first three steps is mesed up.
Of course it is also worth noting that these first two steps can be done by anyone, without even getting real help from Windows. Microsoft and many third parties have provided tools to help woth both fonts and keyboards.
But in the context of Microsoft being the one doing the enabling, we should start talking about the things Microsoft can do for enabling a language that goes beyond these things.
STEP THREE: Underlying Rendering Support
Even that basic display needs a lot more behind it to do seemingly simple scenarios automatcally like
So the proper rendering support via Uniscribe and DWrite and in some cases GDI font linking is important. The only thing cooler for language display than having a good font is not having to choose that font explicitly, so skipping this third step is ill advised.
Obviously there are other little items in this step like adding the Unicode character names to Character Map that don't affect rendering per se but definitely make working with fonts and characters easier.
Additionally, once the next two steps are done, additional rendering support can be expanded to handle features like digit substitution or font linking based on system locale or writing system differences implicit in different locales, and so on. All of the various pieces have to be in place for these last few fancy items to work properly, no matter how much of the work is done earlier in preparation for when the step appears to people using the system.
STEP FOUR: Underlying Script Support in NLS
This step may already have been done ages ago if the script was already supported and all the requisite characters have their properties in the OS tables, but often times new languages that Windows has never supported might require whole scripts or specific individual characters that have just been added to Unicode to be added to the system as well.
It is easy, but important, to have this support.
STEP FIVE: Underlying locale support in NLS
I have a colleague who is grimacing at the way I have added "locales" to a discussion of "language enablement" but the next part of the enablement process involves sorting and date formats and calendars and language names, and so on. And all of these items are stored on a per locale basis. So that person is likely just going to have to suck it up and get over it
STEP SIX: The Localization
Now this step has many substeps within it, but for the moment we'll treat it as one big chunk.
Once of this support is there, people can start seeing the user interface itself making use of the enabled language!
Ok, so there we go.
Now looking at the steps I gave:
At which point would you declare a language to be enabled?
There are several teams that consider enablement to happen once their work is done, especially when other steps aren't planned.
In most cases Microsoft in general won't claim they have enabled a language unless a supportable chunk of steps 1-5 are present.
But how much support is relative: not every language is intended to go through every step.
There are even languages that were originally intended to go through the whole series of steps that ran into problems along the way; at that point all of the support can be yanked out but in many cases the partial support will be left in and shored up so that proper support is what will be seen.
If you look at Windows you can probably find languages essentially stopped at each of these steps.
In fact, anyone who can name one language that only goes so far as each step will win the prize today!
Andrew C on 16 Feb 2011 3:37 PM:
I'd place part of Step 3 into a Step 0 ... if there isn't any "proper rendering support" for a complex script, then even step 1 doesn't get you anywhere in particular.
Michael S. Kaplan on 16 Feb 2011 4:06 PM:
Remember, they are assessed in tandem, and the majority of scripts that are supported are *not* complex in the shaping sense. :-)
Andrew C. on 16 Feb 2011 4:13 PM:
Very true, making them somewhat more straight forward.
But the real fun is supporting the languages using the missing complex scripts ;)
Yuhong Bao on 16 Feb 2011 11:58 PM:
BTW, classic Mac OS had support for WorldScript and made it easy for third-parties to plug in new script systems, which Evertype used to add support for languages such as Inuktitut long before Windows had support for the language. If you take a look at www.unicode.org/.../APPLE you will see the credits for Michael Everson of Evertype in some of them.
Dong Wang on 24 Mar 2011 11:38 AM:
Thank you very much, Michael.
I really apprecaited you outline the picture for me. :)
Dong
referenced by