More things that don't work for locales in the CRT

by Michael S. Kaplan, published on 2008/05/02 10:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2008/05/02/8449317.aspx


They say that you could put just the word Playboy or Hugh Hefner or a picture of the Playboy Bunny on an envelope and if the postage is right then it will make its way to the mansion.

It is funny to imagine that kind of fuzzy matching at work, and not just because of how many people there might be wearing fuzzy slippers. :-)

But there is a case where a function created by Microsoft tries to do the same sort of fuzzy thing!

I should explain.

On the surface, the Visual C++ Run-Time Library was way ahead of its time.

After all, it was taking a string parameter in _wsetlocale back when LCIDs were the big thing on Win32 (of course we all know now that LCIDs suck!).

Seems like they are all set, right?

Well, in practice it is not such so great as the above might try to imply.

First of all there is the Language and Country/Region Strings documentation which explains the syntax it expects:

locale  "lang[_country_region[.code_page]]"
            | ".code_page"
            | ""
            | NULL

And the language strings are the ones described here, the country/region strings are the ones described here, and the code page piece is described here. If you look at this format and these topics, you'll see very little relationship between the locale names now used by Windows and the .NET Framework and the names that this function will accept.

The many examples shown in the _wsetlocale topic's remarks section only help to prove how complex and different all of this is -- which makes it kind of funny that it won't support what is now a great example of Microsoft trying to follow international standards!

Then there is the fact that _wsetlocale doesn't support Unicode-only locales, of course.

Perhaps if we sent folks down for a relaxing weekend in the grotto they would come back refreshed and ready to implement RFC 4646 and RFC 4647? :-)

As a bonus, we'd then see support of custom locales anytime the OS supported them (in >= Vista), something that won't happen until then even though the name could be used as is....

 

This post brought to you by (U+0a9b, a.k.a. GUJARATI LETTER CHA)


John Cowan on 2 May 2008 2:01 PM:

Just remember the word for what comes out of a grotto:  grotesque.  (Really.)

Wyatt on 2 May 2008 2:53 PM:

Uh, U+09ab is ফ BENGALI LETTER PHA.  GUJARATI LETTER CHA is U+0a9b.

Michael S. Kaplan on 2 May 2008 3:04 PM:

Yikes, you are right. Good catch! :-)

Thanks!


Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

go to newer or older post, or back to index or month or day