What is the freaking point of LOCALE_USE_CP_ACP?

by Michael S. Kaplan, published on 2008/10/15 11:16 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2008/10/15/9000666.aspx


The question I got the other day was a "slightly less suitable for families" version of this blog's title.

It came out of It used to be right, dammit!, and reader Jon was kind of confused as to why this flag and so many other *_USE_CP_ACP flags are used in the various NLS API functions that also take a locale.

I may have unintentionally encouraged the "off" language by my own blog's use of the word dammit so I'm not speaking ill of the word used; I'm just trying to show a little control. :-)

Now the reason for having LOCALE_USE_CP_ACP and related flags is simple enough.

You see, if you call the ANSI versions of functions like GetDateFormat (i.e. your call resolves one way or another to GetDateFormatA) then by default the code page that will be used to convert from the Unicode data in Windows is the default system code page (ACP) of the locale you pass in.

This functionality is the hallmark of the double secret ANSI applications I first mentioned in Double Secret ANSI, part 1 (Somewhere between ANSI and Unicode).

This is a great feature -- when combined with support within GDI and the USER subsystem it allows non-Unicode applications even on Win9x to support characters off the default system code page.

However, most non-Unicode applications don't bother with such things, and stay firmly within the CP_ACP of the machine they are running on.

For them, the use of double secret ANSI-type features are not only unhelpful, but they are also kind of dangerous since the data is treated as already being converted from Unicode, even though it was converted incorrectly from the application's point of view.

In such cases, it is usually safer to get back a question mark ten to get back the wrong character -- which is what you would other wise get for any non-ASCII ( > 127) piece of the ANSI string.

And that is where LOCALE_USE_CP_ACP comes in. It gives you that question mark in such cases!

Of course since most applications are not double secret ANSI applications, the fact that the default is basically to support the double secret ANSI stuff while not supporting it requires an opt-in flag does seem like a strange choice, but keep in mind that keyboards don't really even give you the choice in the first place, and the flag itself was added later as a way to help mitigate a reported bug -- it is too late to change a default behavior when you've already shipped it for several versions....

When the issue was mitigated, it was a simple bug fix for the NLS subset of the Win32 API, not for all of Windows.

Thus LOCALE_USE_CP_ACP and its related cousins are always there and available, even though other parts of the double secret ANSI world don't have such a mechanism available....

One could perhaps argue that if such a mechanism existed that bugs like that mentioned in Double Secret ANSI, part 2 (the brokenest one yet, sorry 'bout that!) would be just as debilitating, though perhaps not as confusing!

 

This blog brought to you by ? (U+003f, aka QUESTION MARK)


Yuhong Bao on 18 Nov 2010 10:11 AM:

For example, people call these functions using VB(A) Declare statements, which always use the default system ANSI codepage for the conversions from/to Unicode. If they forgot to pass LOCALE_USE_CP_ACP to it, in the case the default system ANSI codepage do not match the one used in the locale, the conversion done by VB(A) will result in garbled characters. Another alternative is to use a Byte array, and then use StrConv with vbUnicode and pass in the LCID to convert to Unicode using correct codepage.

Michael S. Kaplan on 18 Nov 2010 1:06 PM:

Better still -- StrPtr and byVal Long parameters....

Yuhong Bao on 18 Nov 2010 2:54 PM:

Not available on Win9x, but yes I'd try that first.

Yuhong Bao on 18 Nov 2010 2:56 PM:

Note BTW that the same thing applies to .NET developers calling this using P/Invoke and Charset.Ansi or Auto.

Michael S. Kaplan on 18 Nov 2010 4:44 PM:

I don't respect those people though. :-)


go to newer or older post, or back to index or month or day