The difference between C1_SPACE-ing out and drawing a C1_BLANK

by Michael S. Kaplan, published on 2007/06/11 11:39 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2007/06/11/3230072.aspx


Over in the microsoft.public.win32.programmer.international newsgroup, PLS asked the following question:

Can someone please explain the difference between C1_SPACE and
C1_BLANK in the character types returned from GetStringTypeEx?
What characters fall in either catagory?

  Thanks,
    ++PLS

Microsoft can't really take credit for the meaning of the C1_CTYPE flags that GetStringTypeW returns (remember not to call GetStringTypeEx, as I pointed out in To Ex or not to Ex? THAT is the question!).

The original meaning comes from that whole POSIX internationalization world, where the actual definition you can find if you really look for it is:

Now given these slightly odd sorts of definitions, the script that Microsoft uses to figure out what to do with it's implementation is:

Here is where most of the relevant characters fall (note the C1_DEFINED, which was added in XP and convinced us as a team to be much more cautious about adding ctype values!):

I even got to learn something when I built this table -- I always assumed that the implementation of char.IsWhiteSpace that added some other random characters (ref: here) was due to to silly VB backward compatibility issues.

Which it is.

However, it is clear to me now that the original VB silliness was due to an attempt to support POSIX (probably because internally it used the CRT isspace function, which in turn is dependent on the NLS data returned by GetStringTypeW, above.

It is the down side of assuming anything is silly -- it usually turns out to be the fault of code you used to own at one point!

 

This post brought to you by áš€ (U+1680, a.k.a. OGHAM SPACE MARK)


no comments

referenced by

2008/07/26 Don't sneak a BOM in on someone who promises to ignore free space

go to newer or older post, or back to index or month or day