by Michael S. Kaplan, published on 2005/01/06 13:09 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/01/06/347834.aspx

A few days ago, PEK asked in the newsgroups about WideCharToMultiByte and MultiByteToWideChar:

I'm a bit confused about the first parameter in MultiBytToWideChar. It
is telling which code-page to use. You could use the value CP_ACP
("ANSI Code page"), but what does it mean? Is it the value returned by
GetACP() or is it code-page 1252?

What I want to know is that if you the same result if you call:

MultiByteToWideChat(CP_ACP, ...)

as when you call:

MultiByteToWideChat(GetACP(), ..)

The answer is indeed that they will both give the same results. One is a constant that says "I do not care what the system default codepage is -- just use it" while the other asks "what is the default system codepage?".

The natural question to ask wny time there are to or more ways to do the same thing is which would be faster -- caching the return of GetACP() and using that, or using CP_ACP?

Well, on NT-based platforms, CP_ACP is slightly faster, and by slightly I mean it would take 1,000,000 iterations or more to see even the slightest difference. On 9x-based platforms, it does indeed look like the codepage value itself may be ever-so-slightly faster, but again not enough to matter. i prefer CP_ACP as the constant because it just seems easier to keep track of, to me. It should likely never matter to your code.

ADDENDUM 1:00pm -- Ok, I had a chance to look at the Win9x code, and I have changed my mind. I could easily see it making a difference; CP_ACP is going to be slower. Not by a lot, but its just not the smartest code in the world and I remain committed to supporting smart code paths. I also understand why the same code is not used in the NT codebase. For what its worth, MSLU was actually written using the cached code page method, based on someone's comment that this was a little faster on Win9x. Well, the comment is right and I'm glad MSLU does it this way.

Though I can probably say for fact that using the return of GetLocaleInfo (after returning it as as string and converting to a number) with the LOCALE_IDEFAULTANSICOODEPAGE with the LCID returned from GetSystemDefaultLCID, making these calls every single time. But if you have to be told that explicitly then your code is likely too slow anyway. :-)

Now if you are using the .NET Framework, your question is probably wondering which is faster -- using the built-in Encoding.Default, or using the Encoding.GetEncoding() method to create the Encoding object on its own. And the quick answer to that is to use Encoding.Default since it is always faster.

But the better answer is the one Shawn Steele (the owner of encodings and code pages) gives -- use Unicode and avoid all of the extra work here. This is so much easier there since all of the data is Unicode already.... 


This post brought to you by "" (U+fe4c , a.k.a. DOUBLE WAVY OVERLINE)

# Jochen Kalmbach on 6 Jan 2005 1:38 PM:

Just a small word to UNICODE-support in .NET:
.NET only supports UTF-16 and not the full unicode-range... If you want to use the full unicode-range you have to use the System.Text namespace and do ugly work with surrogates...

# Michael Kaplan on 6 Jan 2005 4:21 PM:

This is not really true.

Anyone who assumes that working with surrogate pairs is evil and that therefore UTF-32 is somehow better has not spent much time dealing with European languages that have many combining characters required for proper support.

The fact is that if you are using the .NET Framework you casn support ALL OF UNICODE in your code via the intrinsic System.String datatype. There is no part of Unicode that cannot be accessed in this way.

If you are interoperasing with folks who have requirements for other Unicode encoding forms or schemes then the Enoding class opens them all up for you.

But the work to deal with supplementary characters is no worse than the text element related requiements of all of Unicode.

# Jochen Kalmbach on 7 Jan 2005 3:03 AM:

But the default encoding for System.String is UTF-16, or?

# Michael Kaplan on 7 Jan 2005 8:58 AM:

Strings in .NET *are* UTF-16. Encoding.Default represents the default system codepage, which will not be UTF-16 but the encoding represented by CP_ACP.

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

referenced by

2005/01/08 How does it detect invalid characters?

go to newer or older post, or back to index or month or day