Invariant and Ordinal Redux

by Michael S. Kaplan, published on 2005/04/13 09:19 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/04/13/407823.aspx


I have talked about LOCALE_INVARIANT / CultureInfo.InvariantCulture before, in Comparison confusion: INVARIANT vs. ORDINAL and Where is the locale? "Its Invariant." In where? and talked a little about the if not noble then at least deterministic intent of this odd locale with no real country and no real language. But when I look at how people use it, what I am most often struck by is two different things:

  1. Developers tend to misuse it at least ten times more often than they use it correctly. And that is a charitable estimate.
  2. The name of this beast is so staggeringly bad that it is probably the reason for the heavy pattern of misuse described in #1.

That  post I link to for LOCALE_INVARIANT has probably the best description I have ever seen of the purpose of Invariant:

The LOCALE_INVARIANT is a special locale identifier that is locale independent. It is designed for system level functions that require consistent results (for example, sorting in the file system) regardless of the locale that the user has chosen. Typically, an application does not use LOCALE_INVARIANT because it expects the results of an action to depend on the rules governing each individual locale.

LOCALE_INVARIANT is defined as LANG_INVARIANT for the primary language, SUBLANG_NEUTRAL for the sub-language, and SORT_DEFAULT for the sort id.

In fact the only real problem with this summary is that it located in the remarks for the MAKESORTLCID macro, whose only connection to LOCALE_INVARIANT is that like all proper LCIDs you can construct the value with the macro if you did not want to use the predetermined constant. At least they mention how the construction is done (I have seen developers wonder why MAKELLANGID(LANG_INVARIANT, SUBLANG_DEFAULT) causes an error when you try to use it -- I tell them to just the predefined constant and not try to build them all when you do not have to!).

The most common misuse is for people to do a ToLower() operation followed by an invariant comparison to validate filenames, which if you are a regular reader here you will know that it is hard to get less accurate results out of code than this approach. Well, you could write code that reformats the user's hard drive when you run it and that would be worse. But it is in the top five "bad algorithms you can create while soberly writing code." With the bonus of being an extra string allocation.

Yikes!

I have decided that I will from time to time remind people of the difference between ORDINAL and INVARIANT, because neither name is all that intuitive but each of them can be incredibly useful (when used properly/appropriately). As I said in the first article above:

That problem remains to this day, though every single time I speak at a conference or answer a question in a newsgroup or get someone to look at posts like this one, then there is at least one less developer who has this problem. Maybe this time it is you? :-)

I kind of feel a summarized list of DOs and DON'Ts coming on in a future post, and the issues surrounding Invariant and Ordinal comparisons will probably have a prominent place in that list (as will some of those casing issues).

 

This post brought to you by "ლ" (U+10da, a.k.a. GEORGIAN LETTER LAS)


# Ben on 13 Apr 2005 10:39 PM:

Michael,

I appreciate your enthusiasm for picking out common programming errors like this, but as a professional programmer, I find a lot of these internationalization parameters confusing.
How do I know if I need to pass the NORM_IGNOREKANATYPE flag to CompareString? How do I know if I want LOCALE_USER_DEFAULT or LOCALE_SYSTEM_DEFAULT, or some other locale?

I simply don't know. Unless I learn Japanese, or know someone who knows Japanese, I'll never know the answer. The trouble is that the APIs feel like they were written by linguists.

Me? I just want to compare filenames, or compare entries in a hash table, or compare usernames, etc. I don't want to even have the choice of ignoring kana types. I just want the CompareStrings API do the *right thing* out of the box. If that is too hard for a single function, then let's write some API sets that are easy to use for common cases. I think this would be a more useful endeavor than to write articles about the nuances between CT_CTYPE3 and CT_CTYPE2.

Sometimes less choice is better. Please please finish that list of do's and don'ts. Please please make a list of "If you want to sort like a dictionary, do this... If you want to put filenames into a hash table, do this..."

# Michael S. Kaplan on 14 Apr 2005 8:13 AM:

A very good thought -- worthy if its own posting, if you ask me!

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

referenced by

2006/05/24 Invariant vs. Ordinal, the third

2005/12/22 New in Windows Vista: OrdinalIgnoreCase for Win32

2005/10/15 If you are using INVARIANT then you are probably MISusing it, #1

2005/06/02 The New String recommendations

2005/04/14 On approaching international programming....

go to newer or older post, or back to index or month or day