More on our non-Unicode heritage

by Michael S. Kaplan, published on 2006/12/31 03:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/12/31/1383831.aspx

As the year 2006 draws to a close and as I set my sites on what applications I need to try to get on the ball about Unicode support over the next two to five years, I realized a major impediment to this goal.

I'll blame a fellow Technical Lead who shall remain nameless, for pointing out (in response to an unrelated question) the logo requirements documentation for XP and for Vista.

With a sense of dread I read through both of them, and neither one contains references to either requirements or recommendations that have anything to with international support of software on Windows.

(ref: Our non-Unicode heritage, with the George Carlin riff that could act as a call to arms if we could get the full bit written and recorded!).

I guess getting international features to start getting at least optional bullet item status might be the first step to being taken seriously in this space....

This post brought to you by ඤ (U+0da4, a.k.a. SINHALA LETTER TAALUJA NAASIKYAYA)

<<set my sites on what applications I need to try to get on the ball about Unicode support over the next two to five years>>

1. The CAB and MSI file formats

Right now is "bunch of bytes + code page tag". Even if they start accepting UTF-8 and it is already a step forward.

2. The VS IDE & MFC

Now, that the MUI API is public, it would be nice to have MFC using it, and some support in the VS IDE itself, similar to what we have in .NET projects (add languages to the project and have it build the resource only DLLs in the proper places).

But not quite identical to the .NET way, because tagging each dialog/form as localizable and adding languages for each one is a pain (just imagine adding 20 language for an application with 300 dialogs).

Down here in the trenches, I keep throwing my hands up in the air.

We need more like your Converting to Unicode series. We also need way more on living with code pages and moving between that world and Unicode (e.g., between native code-paged Windows and .NET, between native code-paged Windows and Java, between native code-paged applications/resources and any Unicode representations.)

I watched a high-schooler in Japan struggle with getting shift-JIS through his keyboard into an application developed in pure C++ using VC++ 2005 Express Edition and the whole thing is not pretty.

This is way short of localization and internationalization of an application -- it is about interchange and communication within the desktop environment.

At the moment, it all appears to be magic, chewing gum, and lots of duct tape, with plenty of fragility and failure cases.

Let's make it a New Year. Happy, Happy.

Hi Dennis,

Well, I do plan to keep blogging, so I guess that is a start. :-)

But in this post I was actually thinking more about the products themselves and how to make sure they are doing the right thing to support the languages and scripts that are already in the "supported" column from our point of view but due to the fact that non-Unicode apps are still so widespread, the languages/scripts aren't supported in those products.

In other words, the guidelines (and the fact that they mention none of this) are a symptom of the problem I was thinking about here.

Making internationalization easier is an important goal that I think we have to live up to in the long run (hell, it is why I blog and I why I agree that features have to be easier!). But telling people it is important to do is the first step; if we are not telling them that, then it does not matter how easy we make it....

Happy New Year, Michael!

I agree about emphasis on the products and especially the guidelines and qualification requirements (e.g., Works on Vista, designed for Vista, etc.)

I certainly want to see more about all of this from you (and maybe a guide to older posts too, because it can be difficult to find older posts by search.

Thinking about the general problem of codes, characters, languages, etc., -- not nearly so deeply as you do -- it strikes me that one of the most difficult challenges is making what is done in this area explainable. It also seems that product teams forget that sometime too (or have an implementation explanation that has a really contorted use case at a practical level). Some of the disconnects you report strike me like that.

I guess I wouldn't be so grouchy about this except I'm working on some bridge code that needs to produce Unicode from interfaces that deliver non-UTF8 char[] data. I am anxious about being able to mind-read the implementations of those interfaces with regard to the code page I should assume. I don't know if the obvious choice will work because I have my doubts about the variety of implementations that deliver the char[]. I get to try it and pray that it works most of the time. When it doesn't, users will see it and have no way to remedy it. Arghh.