What code page does MSLU convert with?

by Michael S. Kaplan, published on 2005/05/13 02:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/05/13/417060.aspx

The Microsoft Layer for Unicode is usually just a very simple wrapper over the non-Unicode APIs on Window 95, 98, and Me, that uses the default system code page to do the conversions.

There are of course exceptions to this; there are times when Win9x will do just fine with the original Unicode strings, and when that is the case, MSLU works hard to honor that support. I will talk about those cases another day.

(A lot of the information in this post has already been published in that article that was written for MSDN Magazine entitled MSLU: Develop Unicode Applications for Windows 9x Platforms with the Microsoft Layer for Unicode. I just did not have the same space restrictions here!)

But the question of what code page to use is not one that is always universally clear.

I would say there are six different buckets that the various functions fall into:

Some use the default system codepage (CP_ACP), the one returned by the GetACP function;
(e.g. functions like CreateDC, and most of the other functions that MSLU wrapped)
Some use the default system OEM code page (CP_OEMCP), the one returned by the GetOEMCP() function;
(e.g. functions like CharToOemBuff)
Some could use either of the above two code pages, depending on the return from the AreFileApisANSI function;
(e.g. file managment functions like FindFirstFile)
Some use the ACP of a particular LCID, as returned by a call to GetLocaleInfo with the LOCALE_IDEFAULTANSICODEPAGE flag;
(e.g. all of the NLS APIs that take LCID parameters, like GetNumberFormat)
Some use the ACP based on a particular device context handle (HDC), which has a Charset associated with it;
(e.g. most of the GDI functions that take an HDC parameter, like GetTextExtentExPoint)
A very few use other mechanisms entirely, we'll call it the miscellaneous bucket.
(e.g. GetClipboardData, which can use all manner of code pages depending on whether CF_LOCALE is specified on the clipboard or synthetic formats are being supported)

The decisions over what to use for the conversion were not arbitrary.

Because, believe it or not, there are many internal pieces of Win9x that actually do support Unicode. This is a fact that Chris Wendt reminded me of years ago, when he pointed to TrueType fonts, large pieces of the GDI that uses those fonts, the NLS data, all of the COM interfaces in the Shell, and more. And he was right; Unicode support is sometimes more than just the interface.

Thus in cases where conversion was already being done by the operating system, the goal was to match that conversion that the OS itself was doing. Because any time we did not match the OS behavior, all of the string outside of ASCII was probably corrupted.

So the answer to the question first posed in the title of this post (What code page does MSLU convert with?) is that it really depends. On the function you are calling, sometimes on how you are calling it, and sometimes on the way other functions have set the stage....

This post brought to you by "ܜ" (U+071c, a.k.a. SYRIAC LETTER TETH GARSHUNI)

# Mike Dimmick on 13 May 2005 10:42 AM:

How much consideration did you give to the implementation of NT's A version of the APIs?

# Michael S. Kaplan on 13 May 2005 10:57 AM:

Good question. Not very much, to tell the truth. The goal has always been to encourage movement to Unicode-based platforms, not to make non-Unicode platforms more appealing....

# Qflash on 15 May 2005 2:33 AM:

RePost:
http://www.yeyan.cn/SoftwareEngineering/MSLUconvert.aspx

Yuhong Bao on 30 Aug 2010 7:29 PM:

What is unfortunate is that the only choices in .NET P/Invokes are Charset.Ansi, Charset.Unicode, and Charset.Auto! There is no Charset.Oem or Charset.AutoOem options, which is a problem when calling console functions. And classic VB always use the ANSI codepage of course. You have to be careful when you call functions using any of these to ensure the function really do use the ANSI codepage. Usually that is quite easy, but often when calling console functions you have no other choice!

Yuhong Bao on 30 Aug 2010 7:32 PM:

Some workarounds if there is no other choice include manually converting the string, which is easy in .NET, but in classic VB, I think you have to manually invoke WideCharToMultiByte and MultiByteToWideChar!

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

referenced by

2007/07/11 GB18030 isn't an ACP, either

2006/08/22 'Unicode' doesn't corrupt, but 'ANSI' can corrupt, absolutely!

go to newer or older post, or back to index or month or day