The Unicode Lame List

by Michael S. Kaplan, published on 2006/04/16 06:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/04/16/577108.aspx


It might take you back to Almost Live and the Lame List, and if so then I was able to inspire the right memories.

It is not a weekly posting, so it's not 'What's weak, this week'. But it will likely just be me periodically posting something that I think shows a certain ignorance of the importance of Unicode/internationalization in software, so perhaps you can think of it as something more like a 'Who put Unicode in the commode' (you have to work a little harder to get the scansion right, but it is possible!).

Anyway, I am looking over at Heath Stewart's blog, and his recent post Opening Patch Files when Compiled for Unicode. Now I am not calling Heath lame here at all -- he isn't, and this is really good information, and I am glad he is putting it out there:

If you want to open a .msp file with the Windows Installer APIs, you must pass MSIDBOPEN_PATCHFILE to the MsiOpenDatabase function, or ERROR_OPEN_FAILED (110) is returned. Below is the definition of both MSIDBOPEN_PATCHFILE and MSIDBOPEN_READONLY from msiquery.h in the Windows Installer SDK.

#define MSIDBOPEN_READONLY (LPCTSTR)0
#define MSIDBOPEN_PATCHFILE 32/sizeof(*MSIDBOPEN_READONLY)

LPCTSTR is defined as LPCWSTR when UNICODE is defined, which is defined as wchar_t*. Since sizeof(wchar_t) is 2, the value of MSIDBOPEN_PATCHFILE is 16 when UNICODE is defined. If you pass this to either the MsiOpenDatabaseA function or the MsiOpenDatabaseW function ERROR_OPEN_FAILED is still returned. The value must always be defined as 32.

For the automation method Installer.OpenDatabase the second parameter must be set to msiOpenDatabaseModePatchFile to open a patch, which is always defined as 32.

The lame part is a general approach that people take when they think about Unicode, due to specific attitudes both inside and outside of Microsoft:

Most developers probably haven't run into this problem yet because of support for Windows 95, 98, and Me, where Unicode is not natively supported and it's typically undesirable to have to ship and support two bootstrap applications. Since Windows NT, 2000, XP, 2003, and future platforms support both ANSI and Unicode it makes sense to compile bootstrap applications for ANSI or MBCS.

It is actually this particular prevailing attitude that finally inspired MSLU as a project -- there had to be a way to get people supporting Unicode, even if they did have to support Win9x.

The problem now is that no one wants to support Win9x in their platforms, but if there is any kind of downlevel story involved (whether it is the Windows Installer folks not supporting a Win9x Unicode version of their support or the C++ folks not wanting to ship an MSLU-ized version of MFC, or whatever) it amounts to a passive-agressive, whiney "it's too late to support your Unicode solution in our product, but we have to keep encouraging the customer ANSI solution of our product."

I wonder how many more years will these teams try to wish the problem away (as people did on the Windows side for so many years before approval was given to do MSLU) before they give up and finally do something about it. Because until it is easy to do, until there is a good backcompat story, and until it is the default setting, most people will not choose Unicode for their solution....

 

This post brought to you by "Ề" (U+1ec0, a.k.a. LATIN CAPITAL LETTER E WITH CIRCUMFLEX AND GRAVE)


# mpz on 16 Apr 2006 8:37 PM:

Like Joelonsoftware writes (see URL), a lot of programmers think it's too much work and wish the problem would just go away (which it won't).

Even in 2006, there's way too much software that is simply oblivious to the whole issue. FTP servers and clients.. argh.

We need to be more vocal about getting UTF-8 supported everywhere and universally.

# Dean Harding on 18 Apr 2006 3:41 AM:

Actually, looking at the comments to Heath's post, the problem isn't that no one has been using Unicode in their installers yet, it's because of the strange semantics of the function call - the fact that they're overloading the parameter as both a string and an integer!!

The idea is supposed to be that you either pass in an LPTCSTR which is the name of a new database file, or you can pass in these "constants" which direct the API to do something special.

But to combine the constants, you don't OR (|) them, you ADD (+) them, so if you've got UNICODE defined, the pointer arithmetic that results from (MSIDBOPEN_READONLY + MSIDBOPEN_PATCHFILE) is the correct value (on the other hand, if UNICODE is not defined then (MSIDBOPEN_READONLY + MSIDBOPEN_PATCHFILE) is exactly equivalent to (MSIDBOPEN_READONLY | MSIDBOPEN_PATCHFILE) anyway.

So as long as you're doing EXACTLY what the documentation says (and not what your intuition would tell you) then it should work with both UNICODE defined and not defined...

# Michael S. Kaplan on 18 Apr 2006 3:48 AM:

But Dean, most people avoid shipping both installers if they can, the same way they don't build apps two ways....

# Dean Harding on 18 Apr 2006 7:21 PM:

Yeah, I know. What I mean is that if you have UNICODE defined, then you have to combine the flags by ADDing them, rather than ORing them. If you don't have UNICODE defined, then it just so happens that ADDing and ORing would do the same thing.

# Michael S. Kaplan on 18 Apr 2006 7:54 PM:

That can't be the *design* though -- it makes no sense that a code change would be needed when going to Unicode. I think they just have a bug in the definition....

# Dean Harding on 18 Apr 2006 9:27 PM:

But if you look at the documentation, it actually says "*Add* this flag to indicate a patch file" which I'll admit goes a little against the usual custom, and I'll also admit it's a silly design.

The problem is that they're overloading this one parameter to accept either a string or an integer. So it's defined as an LPCTSTR type, and the flags are defined as "(LPCTSTR)x". This means that you CAN'T OR two of the flags together (the | operator is not defined for pointer types) so you HAVE to + them.

Like I said, it just so happens that | and + are the same when you have a pointer to a 'char', but | and + are different when your pointer is to a unicode character.

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

go to newer or older post, or back to index or month or day