"POSIX" style locale support on Windows?

by Michael S. Kaplan, published on 2005/03/05 21:03 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/03/05/386021.aspx


Greg asked, in the suggestion box:

The POSIX locale interface seems awkward in that it has a single global locale state that you have to change and then change back if you want to temporarily use a different locale.

It seems like you are actually thinking of the C Runtime's locale-dependent functions. I do not know if they are based on a POSIX per-se (if you comb the CRT documentation you occasionally see vague, oblique references to POSIX requirements, but even the CRT is mainly based on the C Standard.

So I guess my response to the above would be:

  1. None of the APIs I own use a model that is anything like what you describe. :-)
  2. Of course, the reason for this is that NLS has little to do with the C Standard or the CRT (though occasionally the Microsoft CRT calls NLS APIs to do its work; other times they keep their own tables).
  3. NLS has very little to do with POSIX, the major exception being if you look at the GetStringTypeEx you will see that CT_CTYPE1 supports the ANSI C and POSIX LC_CTYPE character typing functions. We attempt to map from Unicode which is a little messy since the former is much more limited than the latter.

With that said, it does seem a limited and unwieldy model to me.  But that is speaking as someone who is not a regular user of it; perhaps a fan of that model would see benefits of the stateful nature of this model. I can say I have often been involved in efforts (inside and outside of Microsoft) to move away from the model due to problems it causes due to the shared locale information in the DLL that anyone in a process can change.

Anyway, Greg went on to propose a different model:

It seems like a better interface would be to look up a locale and get back a handle that you could pass to strxfrm or strcoll or whatever to specify the locale to use for that invocation.

Ie, effectively an object oriented interface where you get back a locale object and call methods like strxfrm or strcoll on it.

I think Greg should probably look at the .NET Framework, which actually provides just such an object oriented interface....

Greg went on to say:

This comes up with sofwtare like databases that want to do things like

SELECT * FROM tab ORDER BY french_str, english_str

where french_str might have a default collation order of fr_FR and english_str might have a default collcation order of en_US. The overhead of repeatedly calling setlocale can be as high as 20x the comparison time on some platforms, but even on platforms where there's a good implementation it's still unnecessary overhead.

Well, Microsoft SQL Server provides such an interface... Greg, you will want to look into the COLLATE keyword, which allows you to specify collations as far down as the expression or query level. I'll be talking further about those things at Tech·Ed in Orlando, in June.

Greg then summarizes by asking

Does Windows provide an interface like that?

Well, as I indicated inline above, Microsoft provides several different ways to get at the core NLS functionality that my team provides, not only to Windows but the .NET Framework, SQL Server, Office, and others. Some of these interfaces share facets with items that Greg describes, and though there are obviously some minor syntatic differences the different models he describes would not look unfamiliar to users of the MS-provided functionality (or vice versa).

This post brought to you by "c" (U+0063, LATIN SMALL LETTER C)


# Andrew Goodale on 6 Mar 2005 7:35 AM:

Take a look at the C++ locale classes. std::locale and the locale facets provide an API that Greg is looking for. And the code is (largely) portable.

go to newer or older post, or back to index or month or day