'A' and 'W' are sometimes living in two different worlds

by Michael S. Kaplan, published on 2006/07/17 17:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/07/17/668787.aspx

When you think about the 'A' and 'W' decorated versions of functions, in most cases the 'A' version is a simple wrapper that converts the strings, calls the 'W' version, and then when needed converts the string back.

However, there are exceptions to this general principle....

The other day, when I posted How long is that non-Unicode string?, Mihai commented at one point (in relation to potentially using CharNextExA in the solution):

"CharNextExA would work here"
Unless you run it on Win XP, where is broken :-)

He was referring of course to what was posted way back in January of last year in We broke CharNext/CharPrev (or, bugs found through blogging?). The bug that made me decide I enjoyed having a technical blog....

And the issue here seemed worthy of a wee little post of its own -- the post you are reading right now, in fact!

Now if you look at CharNext and CharPrev, they do not have separate topics for CharNextW/CharNextA and CharPrevW/CharPrevA. Because for both functions, what they do from a text description standpoint is about the same.

But if you look at what each function has to do:

The 'A' versions have to go one byte at a time, skipping a byte any time the two bytes together make up a double byte CJK idedograph. This never has to happen in the 'W' versions.
The 'W' versions have to go on WCHAR at a time, and continuing past any situation where one is dealing with either a surrogate pair(as of Vista) or a combining/nonspacing character (all versions of Windows except when the aforementioned bug was going on). This never happens in the 'A' versions.

(Of course there is the fact that the Vietamese code page also has some combining characters on it as I mentioned previously, but CharNext and CharPrev have never handled this case properly. I suppose one could call this a bug, though I am unaware of plans to fix this.)

In any case, since there is no real overlap of the functionality needed in the 'A' and 'W' versions, the functions are kept entirely separate. There is no "convert and call" logic, and it would not be useful if there were.

Funny how the desctiptions are the same though, huh?

You can think of it as the difference between riding a bicycle and riding a unicycle -- similar principles, but a very different set of skills.

The bonus trivia question -- do you know why the "convert and call" logic would actually be hard to do here if it had been a good idea?

And the bonus hard trivia question -- can you think of a function where it is a good idea even with that difficulty?

This post brought to you by À (U+00c0, a.k.a. LATIN CAPITAL LETTER A WITH GRAVE)

# Sebastian Redl on 18 Jul 2006 5:12 AM:

The problem seems fairly obvious to me: to convert the string to Unicode, you still need to know how to get from one character to the next, thus you have to implement the functionality anyway.

# Michael S. Kaplan on 18 Jul 2006 5:24 AM:

Hi Sebastian --

Exactly! You got the bonus trivia question....

Now do you know the answer to the bonus hard trivia question? A function that acually has to do that conversion and return the right answer related to position(s) within that string?

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

go to newer or older post, or back to index or month or day