Why does it use strings to represent numbers, exactly?

by Michael S. Kaplan, published on 2007/03/29 14:21 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2007/03/29/1989831.aspx


Aaron asks:

Hello!  I've got a question that's been nagging at me for a while now.  I work for a dev tools company and design a fair number of public APIs.  So I tend to be critical of other companies public APIs and usually wonder "what made them decide to do X?"  And the EnumLocalesProc (and by association, the EnumSystemLocales API) is one of those question-generators.

Why does the EnumLocalesProc receive a string, when all of the locale APIs take an LCID, which is an integer?  If the string passed into the callback is nothing more than a string representation of an LCID (in hex form), then doesn't that basically mean everyone using that API has to convert it to an integer form?  If so, then what's the purpose to it being a string in the first place instead of an integer?  As a corollary, what's the purpose to it having an A and W version if it's simply passing numerical information?

I am rather new to NLS and i8n support, so I feel like I must be missing some piece of the rather complex puzzle.  Thanks for shedding any light onto this for me!  It's really not a dire need, since I can still make use of the API.  But I hate not understanding why something is the way it is.  ;-)

Sincerely,
Aaron

Ah, the eternal question about why EnumSystenLocales has to support A and W versions, and why the EnumLocalesProc callback that returns a number has to do it as a string....

This is actually a question I asked Julie years ago, and her answer (which covered the original, pre-LOCALE_RETURN_NUMBER behavior in GetLocaleInfo!), was that it was based on how the data was stored.

The data was originally stored in the registry as strings for all user override values and lists of locales used for enumeration, and the full data in locale.nls was stored as strings, too. Even for the values that were numbers.

So in the end, functions that dealt with all of this string data just returned it as string data as well. It was just easier.

(Of course it might have been easier in the end to make these numbers; there are interesting architectural consequences to the fact that these started as strings that I'll talk about another day!)

Of course within a version or two the obvious need in apps that have to make thousands of calls to GetLocaleInfo for something like LOCALE_RETURN_NUMBER to be added was met. But no one ever tried to change EnumSystenLocales/EnumLocalesProc since one just couldn't make the same argument. Plus the semantics behind changing a callback were also not entirely clear, and no one wanted to add a function.

Now note that internally (until Vista) the data was still stored as strings.

So even though customers were relieved of the burden of conversions, we were not (which meant the performance hit was still there).

In Vista this changed and we started storing numbers and numbers, with all the attendant benefits of type appropriateness. And now the string-based EnumLocalesProc is just one of those historical oddities that people could ask about from time to time....

 

This post brought to you by(U+0e10, a.k.a. THAI CHARACTER THO THAN)


woizero hannah on 29 Mar 2007 8:01 PM:

Meekael..

Thank you for your work in making sure my language and alphabet does not become a historical oddity that people ask about from time to time. I have just discovered Nyala in vista!!! Wey Hey!!!!!

thank you and the " Many people were instrumental to getting the Ethiopian script in general and the Amharic language in particular supported in Vista"  especially the one crucial piece (you) were heavily involved with ! Well done you and  Ning Jin-Grisaffi and especially Daniel Yacob for getting an Amharic input method into the product after suddenly finding out y'alldid not have one!).

I hope you  blog more about this whole issue some day. :-)

zvik on 16 Apr 2007 5:45 AM:

Hello,

  I tried to use MultiByteToWideChar() function to

 convert UTF-16 encoded string and it failed.

  Is it designed to get only ANSI / UTF-8 code

  or it can get UTF-16 format with special parameters?

Thanks a lot,

 Zvi.

Michael S. Kaplan on 16 Apr 2007 5:54 AM:

It cannot ever be used to "get UTF-16" and convert it to UTF-16, no.

Kind of off-topic though, isn't it? Maybe you could explain what you are trying to do in the Suggestion Box?


Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

go to newer or older post, or back to index or month or day