IsCharSomethingOrOther?

by Michael S. Kaplan, published on 2007/06/19 03:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2007/06/19/3396819.aspx


Reader/colleague Aldo Donetti asked:

HI Michael, would you know what Unicode categories are supported in the IsCharAlpha function? (the Unicode version, of course) The MSDN doc doesn’t really specify that…

Thanks,
Aldo

True, the IsCharAlpha topic in the SDK (here) doesn't get into any specifics. :-)

Version wise, it varies with OS version, and generally MSDN has not made version claims in the past.

If you must know, then it was 3.0 in Win2000, 3.1 in XP/Server 3003, 5.0 in Vista/Server 2008 -- but the properties behind the functions are where it really gets interesting, in the end....

Now I did hint at that issue in my post from September of 2005 (Is Kana 'alphabetic' ? Depends on who you ask....) but a more explicit answer with a bit of backstory is probably something that is just about due....

The rules are simple enough. Both depend on GetStringTypeW's CT_CTYPE1 and CT_CTYPE3, as follows:

There is a comment in the code that is discusses the Kans issue in a manner that baffles me a bit:

/*
 * We don't want to return TRUE for halfwidth katakana.
 * Katakana is linguistic character (C1_ALPHA), but it is not
 * alphabet character.
*/

What is interesting is that the actual check includes full width Katakana and also Hiragana. I guess the comment never got updated.

And then there was another interesting one too, in IsCharAlphaNumericW:

//
// LATER 20 Feb 92
//    We may need to check ctype 3 info if we want to check for
//    digits other than ASCII '0'-'9' (such as Lao digits or
//    Tibetan digits, etc.).
//

Interesting since it isn't true (the other digits count under the C1_DIGIT type, so there is no need to look at the C3_* info....

Now of course as I said back in Is Kana 'alphabetic' ? Depends on who you ask.... the whole notion of Hiragana and Katakana are the kind of thing that a lot of people would consider just as wrong as the digit behavior in the other comment would be if the comment weren't wrong.

I guess a couple of these files could use a little comment cleanup. :-)

Though on the whole I don't really find either of these functions all that useful. Perhaps that is just me, but if not then you can probably write your own using GetStringTypeW....

 

This post brought to you by (U+ff80, a.k.a. HALFWIDTH KATAKANA LETTER TA)


Aldo.NET on 19 Jun 2007 12:35 PM:

Excellent, thanks Michael! :-)


referenced by

2011/11/21 One disadvantage to being supplementary...or Japanese?

2008/06/29 If they say "it's all relative" then remind them it is not a coincidence that there is a show called Relative Madness on TV

go to newer or older post, or back to index or month or day