The difference between USE X and SPECIFY CUSTOM X can be measured by the # of functions that support each

by Michael S. Kaplan, published on 2008/03/28 03:01 -04:00, original URI:

Content of Michael Kaplan's personal blog not approved by Microsoft (see disclaimer)! 

Over in the Suggestion Box, Marvin asked:

Puzzled by MSDN docs. IMLangConvertCharset is recommended for speed but there is no way to tell it what default (fallback) characters to use. Most maddeningly its Initialize function accepts a dwProperty param that can be MLCONVCHARF_USEDEFCHAR.

IMultiLanguage2::ConvertStringFromUnicodeEx has this ability but it is supposed to be slower. ConvertStringToUnicodeEx on the other hand has relevant parameters but says they are ignored.

What should a developer do (other that using 3rd party library)?

Background: I need MLang rather than MBtoWC/WCtoMB C API because I need continuable conversion, i.e. being able to to give parts of the input (possibly broken in the middle of a char) to the conversion routine. It appears that the only MS technology capable of doing it is MLang.

The IMLangConvertCharset interface does indeed claim to be pretty damn fast, but you have to keep in mind something that has been true since the earliest days of WideCharToMultiByte:

The WideCharToMultiByte function operates most efficiently when both lpDefaultChar and lpUsedDefaultChar are set to null pointers. The following table shows the behavior of the function for the four possible combinations of these parameters.

lpDefaultChar lpUsedDefaultChar Result
NULL NULL No default checking. These parameter settings are the most efficient ones for use with this function.
non-NULL NULL Uses the specified default character, but does not set lpUsedDefaultChar.
NULL non-NULL Uses the system default character and sets lpUsedDefaultChar if necessary.
non-NULL non-NULL Uses the specified default character and sets lpUsedDefaultChar if necessary.

The fact that the fastest interface does not expose these ways to slow down the operation even though the MLang equivalent of WC_DEFAULTCHAR exists -- the operation of replacement by the default character and the option of specifying a custom one are two very different methods that aren't going to be available in every method or even in every code page.

But many other of the conversion methods within MLang will support specifying a custom default character and continuable conversions in buffers or streams (as you noticed). Some of them even work!

I don't think the particular lack on a particular interface is a reason to decide there is nothing provided by Microsoft, though. :-)

As a general rule, the desire for ultra-fast does not mix terribly well with the desire for lots of options, and this case is definitely not an exception to the rule.


This blog brought to you by(U+30f2, aka KATAKANA LETTER WO)

no comments

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

go to newer or older post, or back to index or month or day