There and Back Again (aka ACP --> UTF-8 --> ACP)

by Michael S. Kaplan, published on 2011/06/22 07:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2011/06/22/10177704.aspx


Shortly after Raymond's How do I convert an ANSI string directly to UTF-8?, someone with the handle (I'm assuming it is not his real name) of Otis asked me to weigh in on the issue.

I chose to hold back, it seemed to me that the blog and comments to it were proceeding appropriately.

But Otis would periodiclly ping me on it, thinking there was perhaps more to say.

I did respond to the other question Otis asked me, about whether I was jealous of the fact that more people commented on his blogs than mine -- I'm not. and you only have to look at many of the comments over there to see why.I'd probably stop reading the comments or turn them off i I had to deal with all that!

But the big question:

Is there a way to convert an ANSI string directly to UTF-8 string? I have an ANSI string which was converted from Unicode based of the current code page. I need to convert this string to UTF-8.

Currently I am converting the string from ANSI to Unicode (Multi­Byte­To­Wide­Char(CP_ACP)) and then converting the Unicode to UTF-8 (Wide­Char­To­Multi­byte(CP_UTF8)). Is there a way to do the conversion without the redundant conversion back to Unicode?

has already been answered as well as I would have answered it -- there isn't one. You should use Unicode as your pivot encoding between the ACP and UTF-8.

And bemoan the fact that the ACP can't be UTF-8, since that would make this question much easier to answer.

I could post many links to blogs that continually tell the story about how that isn't gonna happen but I'll just be lazy and point to one of them; UTF-8 and GB18030 are both 'NT' code pages, they just aren't 'ANSI' code pages. It points to some of the others.

It still ain't gonna happen.

If you are using the ACP anywhere, then you are lossy -- you should stop doing that. Keep around the old interface if you must but tell people to not use it.

I'll have to talk more about the UELNT project (as I called it in that post) aka the MSL8 project (as I call it here), one of these days.


Raymond Chen - MSFT on 22 Jun 2011 9:49 AM:

There are many days I wish I had very few commenters.

Michael S. Kaplan on 22 Jun 2011 10:15 AM:

Exactly!

Random User 288534 on 22 Jun 2011 12:10 PM:

I enjoy Raymond's blog, as well as your's, Michael. You both frequently post interesting and/or entertaining information. I will agree that (proportionally) your (Michael's) commenters seem to "try" more, with regard to being on-topic, or at least intelligible.

Sometimes I imagine a hypothetical post, in which Raymond talks about (say), "NOT slamming someones hand in the door of a car." Maybe, by chance, the car in the story is a Nissan Leaf. In this case, aside from the handful of comments that actually add to the discussion, there would invariably be comments about:

* "Slamming hands in doors is stupid! Why is this post even needed?"

* "Slamming other people's hands in doors is mean! Why would you do that?"

* "What's so great about Nissans? Toyotas are clearly superior for door-slamming."

* "Obviously those Leafs will crash more often, running Micro$oft."

* "An open-source car wouldn't have this problem [somehow...]"

* "Fords invented car-door-slamming first. How dare Nissan copy them?!"

And so on... Even so, I always hope neither of you will give up writing about topics you find interesting.

Peter Krefting on 23 Jun 2011 1:05 AM:

“You should use UTF-16 as your pivot encoding between the ACP and UTF-8” is a bit more technically correct. UTF-8 is an encoding of Unicode, after all.

Yuhong Bao on 25 Jun 2011 4:01 PM:

AFAIK UTF-8 was invented in 1992, and by the time the Win32 API was already defined as using A/W functions.


go to newer or older post, or back to index or month or day