by Michael S. Kaplan, published on 2005/09/12 10:20 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/09/12/463991.aspx
In the microsoft.public.win32.programmer.international newsgroup, Christian Kaiser asked:
Given the appended small program, I can test whether a Unicode character is AlphaNumerical or not.
If I call it using Half-Width Katakana (arg "0xff66" for example), IsCharAlphaNumericW() returns '0' - but according to the Unicode specs (http://www.unicode.org/Public/UNIDATA/UnicodeData.txt), this is a letter:
FF66;HALFWIDTH KATAKANA LETTER WO;Lo;0;L;<narrow> 30F2;;;;N;;;;;
Do I make a mistake, or are the internal Unicode chartype pages in Windows wrong? We have a very important customer who has problems because of this...
System: Window XP SP2, newest patches applied
BTW: GetStringType() does return the correct information:
IsCharAlphaNumeric(0xff66) -> 0
GetStringTypeW(1) -> 0x300
GetStringTypeW(3) -> 0x8050Strange? I think yes.
Christian
----
#include <windows.h>
#include <tchar.h>
#include <stdio.h>
void main(int arg, char* argv[])
{
WCHAR n = strtol(argv[1],NULL,0);
WORD n1 = 0;
printf("IsCharAlphaNumeric(0x%04x) -> %d\n",n,IsCharAlphaNumericW(n));
if (GetStringTypeW(CT_CTYPE1,&n,1,&n1))
printf("GetStringTypeW(1) -> 0x%x\n",n1);
if (GetStringTypeW(CT_CTYPE3,&n,1,&n1))
printf("GetStringTypeW(3) -> 0x%x\n",n1);
}
Christian is right about the difference between the NLS function and IsCharAlphaNumeric.
According to the NLS function:
so according to NLS, this character is a halfwidth katakana character and it is alphabetic.
However, the logic in IsCharAlphaNumeric explicitly checks to make sure it is either C1_ALPHA or C3_ALPHA and not either C3_KATAKANA or C3_HIRAGANA. So clearly, according to user32.dll neither Hiragana nor Katakana is alphabetic.
Now whether Christian is correct about Unicode's take on the situation is a little less clear -- a general category of Lo (Letter, Other) does not necessarily mean Alphabetic (there is no specific rule as to the meaning of general category via-a-vis a character being alphabetic or not, although Mark Davis of Unicode and others are trying to write up guidelines to map Unicode character data to POSIX style categorizations like Alphabetic for implementations).
So the answer to the question is that it depends on who you ask. Perhaps the best answer is to call GetStringType yourself and decide rather than using the user32.dll Is* function wrappers. Because it seems like every time someone tries to wrap our functions to make it easier, something becomes more complicated....
# Nicholas Allen on 12 Sep 2005 10:57 AM:
# Michael S. Kaplan on 12 Sep 2005 11:24 AM:
# Nicholas Allen on 12 Sep 2005 2:06 PM:
# Michael S. Kaplan on 12 Sep 2005 2:35 PM:
referenced by
2011/11/21 One disadvantage to being supplementary...or Japanese?
2011/05/31 f y cn rd ths, thn cd tht strps yr vwls my nt bther y s mch....
2007/06/19 IsCharSomethingOrOther?
2006/10/20 Complex string mapping