Providing developers a Unicolonic in unmanaged code?

by Michael S. Kaplan, published on 2007/04/18 13:51 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2007/04/18/2178209.aspx


Over in the Suggestion Box, zvi k asks:

Hello,

  I tried to use MultiByteToWideChar() function to  convert UTF-16 encoded string and it failed.

  Is it designed to get only ANSI / UTF-8 code or it can get UTF-16 format with special parameters?

  (I accent  - MultiByteToWideChar(), and not WideCharToMultiByte(). The last function can do it. )

Thanks a lot,

 Zvi.

Now both WideCharToMultiByte and MultiByteToWideChar accept/emit UTF-16 text on one end (the "WideChar" end!), but neither one of them will accept/emit UTF-16 text on the other.

Put simply, there is no code page value you can specify that will allow such a conversion to happen.

There are several points to consider though, especially given that in the managed world this kind of operation is in fact supported, in both directions.

The first point is that in well-formed UTF-16 text, you are converting text into what it already is -- in other words you are doing nothing. This is the kind of thing a developer can accomplish so much more simply with a cast.

And it would be faster, too!

The second point is the only real benefit that such a method would give (and does give in .NET): in text that is not well-formed (e.g. unpaired surrogate values), the text will be cleaned up by replacement by the default character, or it will error out if the flag is passed to do so.

Somehow the image of "MultiByteToWideChar as Colon Blow" is one that does not appeal to me that much (and not just because I miss Phil Hartman), so it turns into a very expensive cast operation....

But if you really want to do this in unmanaged code, as a workaround you can call WideCharToMultiByte and then MultiByteToWideChar, pivoting through UTF-8 (code page 65001).

This will give you the same effect as this "Unicolonic" approach to text validation....

 

This post brought to you by(U+ff1d, a.k.a. FULLWIDTH EQUALS SIGN)


no comments

go to newer or older post, or back to index or month or day