Converting a project to Unicode: Part 1 (Business before pleasure)

by Michael S. Kaplan, published on 2006/12/28 06:01 -05:00, original URI:

Previous posts in this series (including today's!):

Before we get to jump in and work on code, there are a few items to take care of.

Any time the idea of converting an existing project to Unicode comes up, one really has to stop and consider both the benefits and the drawbacks and decide whether it makes sense to do it.

In this case (with MSKLC), the current behavior of the product (in version 1.3) has you double click on an MSI and it will just install. Thus the new requirement that we add a bootstrap setup.exe actually leads to a regression in functionality since clicking on setup.exe if there is a character off of the default system code page in the path will lead to the reported error occurring:

Given our team's firm public stance on the importance of supporting Unicode, there are clear strategic disadvantages to not fixing the problem, which is the main reason that trying to look at whether there are specific mitigations like using short file names (which won't work on some systems anyway) may not be the best workaround.

But obviously not every software development team will have those same pressures so obviously considering mitigations to the problem (or just documenting the problem as a limitation) is always important to consider as an option.

In some cases, another team can have the same set (or at least a similar set) of pressures if they are trying to support a particular market which they would have to admit they cannot in specific cases support the language of that market.

As a tool that has both UI and engine elements, we will be dealing with many interesting scenarios here with setup.exe.

Now one obvious strategic reason to not support Unicode (or to not only support Unicode) would be if Win9x support is also important.

I'll talk more about the Win9x issues later in the series (as even now before I have started, Dean Harding has suggested talking about MSLU integration).

But for now I'll stick to our current project (MSKLC 1.4), which creates keyboard keyboard layouts that can only be installed on NT-based platforms, thus meaning the only thing that Win9x support in SETUP.EXE gives as person is seeing the nice friendly error message telling the person who tried to install anyway to go get bent. So the preliminary triage assumption is that telling people who create a keyboard layout that allows users to type in a particular language that in some cases no character in the path can contain letters in that language is MUCH worse than not being able to tell people who can't follow directions (in  friendly way) to go home until they learn how to follow them.

Each triage will be an individual thing, of course. In this case there are also other non-fatal but still glaring problems, such as dialogs put up by SETUP.EXE showing question marks in non-blocking ways such ass displaying the layout description.

In fact, on a tangentially related note, is is actually support of Unicode strings in keyboard layout descriptions, company names, and copyright strings that led me to originally convert kdbtool.exe to kbdutool.exe on a morning several years ago in a the Unicode Technical Committee meeting where a discussion centering on WG-20 issues that was otherwise threatening to put me to sleep led me to wonder how long such a conversion might take (in fact, it took just over 90 minutes).

I'll talk more about triage issues in later posts after the actual work has been done.

For now, I'll explain the basic decisions planned for the conversion itself. Starting off, I have six of them:

Now, where to get the source from?

I got it from the Windows® Server 2003 R2 Platform SDK Full Download, which I already had installed, and I found the source in the

C:\Program Files\Microsoft Platform SDK for Windows Server 2003 R2\Samples\SysMgmt\Msi\setup.exe\


I believe the part of the directory in red should be in any Platform SDK download.

My current plan is to post the code (14 files including the readme) at the beginning of each post before I go to work on it, so that people following along can treat it like a crossword puzzle and check their answers the next day with a simple tool like WinDiff. The project will not successfully compile as a Unicode binary for most of these posts (I'll warn ahead of time whenever that is the case and tell you how many errors I am getting), so you can consider the fact that I am posting the code to be a good way to follow along without doing any actual work. :-)

Ok, enough blather, let's get to the fun part -- starting tomorrow we're going to be jumping in and handling the code....


This post brought to you by  (U+0b16, a.k.a. ORIYA LETTER KHA)

# Mike on 28 Dec 2006 1:11 PM:

I used to make 'T' binaries also.  But eventually came to the conclusion that all the extra macros just made the code ugly to no real purpose.  It just doesn't make sense to have two versions of your app, one of which is crippled.  Inevitably the Unicode version becomes dominant and all the special case hacks you end up having to convert back & forth or not depending on what UNICODE is defined as break because they're not tested properly.

There are exceptions of course.  If you're making a library intended for third-party use the effort might be worth it.  Or if you need a single codebase that will compile on both Windows and Linux or something.

# Michael S. Kaplan on 28 Dec 2006 2:41 PM:

In this case I think it will work satisfactorily, and not be too cumbersome. But this issue will be discussed as we go along, since as you point out there are trade-offs here to consider....

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

referenced by

2007/12/24 VS just got served!, aka The ??? Shift, aka 'Converting a project to Unicode???' No, it's 'Converting a project??? ToUnicode!!!'

2007/01/05 Converting a project to Unicode: Part 9 (The project's postpartum postmortem)

2007/01/04 Converting a project to Unicode: Part 8 (Fitting MSLU into the mix)

2007/01/03 Converting a Project to Unicode: Part 7 (What does it mean to fit things to a 'T', anyway?)

2007/01/02 Converting a Project to Unicode: Part 6 (Upon the road not traveled)

2007/01/01 Converting a Project to Unicode: Part 5 (Are we there yet? Well, not *just* yet)

2006/12/31 Converting a Project to Unicode: Part 4 (It's /Delightful, it's /Delicious, it's /DUnicode!)

2006/12/30 Converting a project to Unicode: Part 3 (Can I quote you on that?)

2006/12/29 Converting a project to Unicode: Part 2 ('Sorry, you're not my type.' 'Um, maybe I could change that?')

go to newer or older post, or back to index or month or day