New in Vista Beta 1: more use of the word 'linguistic'

by Michael S. Kaplan, published on 2005/08/02 05:10 -04:00, original URI:

Yet another 'New in Vista Beta 1' post!

Now I answer a lot of questions in this blog, some that people ask directly and a lot of others about things that no one has asked but I personally find interesting, but many that people have asked in the past

Back in November of last year, I answered one of my favorite questions when I pointed out that they ask me "why is my Korean text in random order?"

This was back in the early days when I was a bit more popular, so that even though none of the people commenting on the post actually had personally witnessed they issue, they found it kind of interesting (I have since mostly slipped into something a bit more obscure except when I have art that has women falling head first off of mechanical bulls!).


I am not complaining, mind you. But it is hard to not notice the fact that my posts about any topic other than internationalization seems to draw about 2-20 times the interest!

On top of which earlier today I looked at an advance copy of a book on internationalization coming out soon that is probably going to be well over 500 pages, only 5-6 of which are about my favorite topic (collation), which managed to split collation and string comparison into two different topics and spend two pages talking about alternate sorts and none of which covered almost any of the topics I go on about here as real concerns for internationalization in Windows and the .NET Framework. And I think the author might read this blog!

Some days it does not pay to get up in the morning.

And that was just the technical stuff, the non-technical stuff was just as helpful (more on this in another post I'll do later).

Nevertheless, I carry on. Someone likes what I am doing here, I'm sure of it. And I get to say it all my way, too. Sometimes people point out bugs, and other times I find bugs myself while posting. Which is undeniably cool. Maybe a year from now a Google search will dig up an answer to a question that helps someone save the day or whatever. And all of that is really good enough for me.

But every once in a while (and here is where I pop the stack a bit to the original purpose of the post) I get to post about something obscure but fun that no one out there in the world knows about yet.

There are a whole bunch of people who read about the odd use of the word linguistic when I answered that other question (What does "linguistic casing" mean?). At the time I proposed that we could have called the LCMAP_LINGUISTIC_CASING flag LCMAP_UNICODE_SIMPLE_CASING and been just as close to what was actually going on with the flag. But this is selling the flag a little short; after all, it handls Turkic casing, after all, and it takes many of the lookalike symbols in Unicode that are identically shaped to Greek and other letters, and converts them to those letters. All operations that are sensible linguistically even if not a good idea in filesystems and other less linguistic operations.

And I guess that is kind of linguistic.... well, more "language-like" rather than "computer-like".

And in Windows Vista, this use of the word "linguistic" will continue, with several new flags (all of which can be used alongside the other flags in CompareString and LCMapString and all of which are available in Vista Beta 1 and all of (which will be documented in the Longhorn SDK as soon as the part that covers Win32 is available!):

LINGUISTIC_IGNORECASE -- Could have been called, NORM_THEREALANDACTUALIGNORECASE as it does what NORM_IGNORECASE ought to do and only masks the case information in scripts that actually have a notion of case.

LINGUISTIC_IGNOREDIACRITIC -- Could have been called NORM_WHATWEREALLYMEANTFORIGNORENONSPACE, it does what NORM_IGNORENONSPACE does and only masks the diacritic weight for that small range of scripts where actual, European style notion of diacritics are used (which is a great way to address that issue I mentioned earlier that affects Korean and other languages!).

NORM_LINGUISTIC_CASING -- The flag that will allow comparisons on Win32 to handle Turkic case properly, whether or not either NORM_IGNORECASE or the new LINGUISTIC_IGNORECASE is specified (something that could not be done with the existing flags due to the breaking effect that would have on existing code involving the filesystem and other situations).

Now none of these three operations are what a linguist would actually call linguistic. But all three of these operations certainly provide collation support on Win32 with behavior that is more linguistically appropriate than the existing support does on Windows. Which is pretty awesome....

And it is pretty darn linguistic if you accept the tap-dancing act above. Which is a lot better of a dance than the one done about the naming of the ANSI code page, or the Visual InterDev product! :-)


This post brought to you by "İ" (U+0130, a.k.a. LATIN CAPITAL LETTER I WITH DOT ABOVE)

# Rob on 2 Aug 2005 6:17 AM:

Hi Michael,

I just wanted to say that I read your blog everyday, but I don't comment because I don't think I have the necessary knowledge around internationalization to comment sensibly. I suspect there may be quite a few other people that are doing the same.

I have one request though. You have mentioned font linking in previous posts, but haven't described how it actually works. Could you do a post on this?


# Michael S. Kaplan on 2 Aug 2005 6:28 AM:

Hi Rob -- Well, there are the posts and which do talk about different aspects of font linking. That second post especially talks about the difference in how Uniscribe and MLang each handle font linking.... I'll be doing more to cover how Uniscribe and GDI+ do it, and eventually I'll even talk about how Avalon does it, too. :-)

# Sjoerd Verweij on 2 Aug 2005 12:46 PM:

Don't you get a hit count report? I follow your blog religiously, and I doubt I am alone. Much enjoyed and appreciated.

# Mihai on 2 Aug 2005 5:43 PM:

Hey, don’t give up!

If you get no comments, it might mean you are doing a good job explaining.

I know I read you daily, but I do not comment until I have something to say.

So, if you want to see how many people actually read you, just post something technically incorrect :-)

# Michael S. Kaplan on 2 Aug 2005 8:13 PM:

Hi Mihai --

They also might be asleep. :-)

No worries, I have no plans to give up doing this.

Tanveer Badar on 24 Dec 2007 5:08 AM:

Which is undeniably cool. Maybe a year from now a Google search will dig up an answer to a question that helps someone save the day or whatever.

<Evil Conspiracy>What would one be without Google?<g/></Evil Conspiracy>

referenced by

2013/04/04 You need to dot every İ, not dot any I, dot every i, not dot any ı, and cross every t in Turkish

2010/02/17 Knock knock! Who's there? Kana! Kana Who? I Kana got something wrong!

2008/11/14 When features collide (aka Your LCID sucks, but sometimes the bug sucks more)

2008/06/25 Seeing the tears, my heart went out to her as I asked her "Why the Long S?"

go to newer or older post, or back to index or month or day