Converting a project to Unicode: Part 1 (Business before pleasure)

by Michael S. Kaplan, published on 2006/12/28 06:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/12/28/1372747.aspx

Previous posts in this series (including today's!):

Before we get to jump in and work on code, there are a few items to take care of.

Any time the idea of converting an existing project to Unicode comes up, one really has to stop and consider both the benefits and the drawbacks and decide whether it makes sense to do it.

In this case (with MSKLC), the current behavior of the product (in version 1.3) has you double click on an MSI and it will just install. Thus the new requirement that we add a bootstrap setup.exe actually leads to a regression in functionality since clicking on setup.exe if there is a character off of the default system code page in the path will lead to the reported error occurring:

Given our team's firm public stance on the importance of supporting Unicode, there are clear strategic disadvantages to not fixing the problem, which is the main reason that trying to look at whether there are specific mitigations like using short file names (which won't work on some systems anyway) may not be the best workaround.

But obviously not every software development team will have those same pressures so obviously considering mitigations to the problem (or just documenting the problem as a limitation) is always important to consider as an option.

In some cases, another team can have the same set (or at least a similar set) of pressures if they are trying to support a particular market which they would have to admit they cannot in specific cases support the language of that market.

As a tool that has both UI and engine elements, we will be dealing with many interesting scenarios here with setup.exe.

Now one obvious strategic reason to not support Unicode (or to not only support Unicode) would be if Win9x support is also important.

I'll talk more about the Win9x issues later in the series (as even now before I have started, Dean Harding has suggested talking about MSLU integration).

But for now I'll stick to our current project (MSKLC 1.4), which creates keyboard keyboard layouts that can only be installed on NT-based platforms, thus meaning the only thing that Win9x support in SETUP.EXE gives as person is seeing the nice friendly error message telling the person who tried to install anyway to go get bent. So the preliminary triage assumption is that telling people who create a keyboard layout that allows users to type in a particular language that in some cases no character in the path can contain letters in that language is MUCH worse than not being able to tell people who can't follow directions (in friendly way) to go home until they learn how to follow them.

Each triage will be an individual thing, of course. In this case there are also other non-fatal but still glaring problems, such as dialogs put up by SETUP.EXE showing question marks in non-blocking ways such ass displaying the layout description.

In fact, on a tangentially related note, is is actually support of Unicode strings in keyboard layout descriptions, company names, and copyright strings that led me to originally convert kdbtool.exe to kbdutool.exe on a morning several years ago in a the Unicode Technical Committee meeting where a discussion centering on WG-20 issues that was otherwise threatening to put me to sleep led me to wonder how long such a conversion might take (in fact, it took just over 90 minutes).

I'll talk more about triage issues in later posts after the actual work has been done.

For now, I'll explain the basic decisions planned for the conversion itself. Starting off, I have six of them:

First and foremost, I will not be turning an "A" binary into a "W" binary; the plan is to turn it into a "T" binary a-la TCHAR.H and so on. When I am done I want something that I could compile either way without requiring further code change, if for no other reason than if it did prove to be too difficult and the idea got postponed, I wouldn't have to throw away my changes. :-)
Second, I am not going to try to add any other new features while I am in there -- that will be a separate effort done at another time, so I can be sure to isolate any bug I might introduce from bugs that a feature might introduce.
Third, I am going to keep it compiling the same way it does now (running NMAKE in an SDK or VS command line), and not require anyone to build special project files (for the actual setup.ex that ships I will violate this principle to integrate it with the regular build process, but for the project I am working on for the series I will be doing no such thing).
Fourth, I am not going to be afraid of errors, even if I am seeing hundreds or even thousands of them. Because I know in the end there will be none and the nature of this project is such that changes to a single commonly called function could add many on its own.
Fifth, for most of the time I will be working on resolving those errors in reverse order, so that I can compile, work on the errors at the bottom of the list, and then compile again.
Sixth, I'll be using an editor like VS that will let me load all files and more importantly do find/replace operations on all files while still letting me look at each one in case anything special has to be done.

Now, where to get the source from?

I got it from the Windows® Server 2003 R2 Platform SDK Full Download, which I already had installed, and I found the source in the

C:\Program Files\Microsoft Platform SDK for Windows Server 2003 R2\Samples\SysMgmt\Msi\setup.exe\

directory.

I believe the part of the directory in red should be in any Platform SDK download.

My current plan is to post the code (14 files including the readme) at the beginning of each post before I go to work on it, so that people following along can treat it like a crossword puzzle and check their answers the next day with a simple tool like WinDiff. The project will not successfully compile as a Unicode binary for most of these posts (I'll warn ahead of time whenever that is the case and tell you how many errors I am getting), so you can consider the fact that I am posting the code to be a good way to follow along without doing any actual work. :-)

Ok, enough blather, let's get to the fun part -- starting tomorrow we're going to be jumping in and handling the code....

This post brought to you by ଖ (U+0b16, a.k.a. ORIYA LETTER KHA)

# Mike on 28 Dec 2006 1:11 PM:

I used to make 'T' binaries also. But eventually came to the conclusion that all the extra macros just made the code ugly to no real purpose. It just doesn't make sense to have two versions of your app, one of which is crippled. Inevitably the Unicode version becomes dominant and all the special case hacks you end up having to convert back & forth or not depending on what UNICODE is defined as break because they're not tested properly.

There are exceptions of course. If you're making a library intended for third-party use the effort might be worth it. Or if you need a single codebase that will compile on both Windows and Linux or something.

# Michael S. Kaplan on 28 Dec 2006 2:41 PM:

In this case I think it will work satisfactorily, and not be too cumbersome. But this issue will be discussed as we go along, since as you point out there are trade-offs here to consider....

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

referenced by

2007/12/24 VS just got served!, aka The ??? Shift, aka 'Converting a project to Unicode???' No, it's 'Converting a project??? ToUnicode!!!'

2007/01/05 Converting a project to Unicode: Part 9 (The project's postpartum postmortem)

2007/01/04 Converting a project to Unicode: Part 8 (Fitting MSLU into the mix)

2007/01/03 Converting a Project to Unicode: Part 7 (What does it mean to fit things to a 'T', anyway?)

2007/01/02 Converting a Project to Unicode: Part 6 (Upon the road not traveled)

2007/01/01 Converting a Project to Unicode: Part 5 (Are we there yet? Well, not *just* yet)

2006/12/31 Converting a Project to Unicode: Part 4 (It's /Delightful, it's /Delicious, it's /DUnicode!)

2006/12/30 Converting a project to Unicode: Part 3 (Can I quote you on that?)

2006/12/29 Converting a project to Unicode: Part 2 ('Sorry, you're not my type.' 'Um, maybe I could change that?')

go to newer or older post, or back to index or month or day