Double Secret ANSI, part 1 (Somewhere between ANSI and Unicode)

by Michael S. Kaplan, published on 2007/03/08 13:15 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2007/03/08/1838231.aspx


Anyone who reads here regularly knows that I'd always prefer to see applications support Unicode.

But obviously there are a few applications out there that don't support Unicode. At least not yet.

Some of those applications have been around for a long time, and they manage to do ANSI quite well.

A few of them even handle ANSI brilliantly -- to the level of Double Secret ANSI.

(By the way, these applications shouldn't sprain their wrists patting themselves on the back -- because for more than 40% of the locales that ship in Vista, their application will brilliantly force users to see some or all of their text converted to question marks, a dialect that no one really knows all that well!)

But alas, I digress.

I was going to explain what I mean when I talk about an application supporting Double Secret ANSI.

Let's think about Windows 95/98/Me for a moment.

A lot of Unicode there, from Shell interfaces to fonts to most of the GDI internals and so on. But cursed with a bunch of ANSI interfaces. And to add insult to serious injury, you can't ever change the default system locale, the "Language for non-Unicode programs" as it was later known.

So why is it that Windows Me has a locale list of over 100 languages, many of which use entirely different code pages?

Perhaps why is too existential of a question for a technical blog. So let's try an easier one.

How is it Windows Me can support a locale list of over 100 languages, many of which use entirely different code pages?

Let's break that down and start with something easy, like keyboard layouts. How does the en-US Windows Me support a Russian keyboard layout?

(Keep in mind that the same answer will extend to the non-Unicode applications that run on the NT-based versions of Windows, too!)

It turns out that keyboard layouts in a non-Unicode application are given the opportunity to express themselves -- using the default system code page of the LANGID that comes out of the keyboard layout's KLID value (not its HKL value!).

Now most applications don't really give a fig about any of this, and pop either garbled text or question marks into their application.

But let's think for a minute about the ones that would choose to do something with this data. They can get a lot further than you might think, given the opportunity to do so....

You can display text using TextOutW/ExtTextOutW, you can convert it to Unicode or store it with its code page. You can put it on the clipboard and stick a CF_LOCALE tag on it so others can know what code page it is. You can call NLS API functions with the right LCID to get data to use and manipulate text via that code page. You can put it in an email and send it to someone else. And so on.

It's not Unicode, but it does give you the ability to support more languages than just the ones covered by the default system code page of your machine.

Like I said, most people don't care bout this kind of crap. And those are the applications that aren't Double Secret ANSI applications.

I'll be talking more about how these work in the NT-based world in the next post in this series....

 

This post brought to you by A (U+0041, LATIN CAPITAL LETTER A)


# Mikkin on 8 Mar 2007 6:45 PM:

> Perhaps why is too existential of a question for a technical blog.

I appreciate the sentiment, but "why?" is not really an existential question.  Existentialism does not seek reasons for the absurd so much as it seeks to create meaning in response.

# Michael S. Kaplan on 8 Mar 2007 9:04 PM:

Hmmmm.... but when one asks "why does software do this or that?" one starts slipping into "Why are we all here?" and so foth. Things get existential quite quickly! :-)

# Mihai on 9 Mar 2007 1:05 PM:

<<for more than 40% of the locales that ship in Vista>>

I have tried all the possible EnumSystemLocales (with LCID_INSTALLED, LCID_SUPPORTED, LCID_ALTERNATE_SORTS),

then GetLocaleInfo with LOCALE_IDEFAULTANSICODEPAGE.

And I only get 32 Unicode-only locales (code page 0) out of 215.

This is more than 14, not more than 40.

How is that counted?

Did you count "not *properly*" supported (including, for instance, Romanian, which is listed with 1250, but the s/t with comma below cannot be represented without Unicode?)

Or is just a mis-understanding between fourteen / forty? :-)

Thanks

# Michael S. Kaplan on 9 Mar 2007 1:24 PM:

I am indeed including ones that technically have a default system locale but which have no way to represent particular letters, even with workarounds (for example Urdu)....

It was meant in the post to be over 40%. :-)

El Señor. on 15 Oct 2008 9:51 PM:

"A lot of Unicode there, from Shell interfaces to fonts to most of the GDI internals and so on. But cursed with a bunch of ANSI interfaces. And to add insult to serious injury, you can't ever change the default system locale, the "Language for non-Unicode programs" as it was later known."

Sure you could, I used to do it all the time (and I was like 14 at the time) - There was a registry entry under HKEY_Local_macine/system/????? I can't remember where exactly, but you could change the system code page, though if you were to change it to a DBCS CP such as Japanese 936, it would become very unstable, but it still worked.

It failed miserably when trying to use 65001 but created some interesting results, 1 being that a MsgBox would pop up at start-up and the text would be rotated by 45°.

Yuhong Bao on 13 Feb 2011 2:19 PM:

In fact, I just read that English Win98 and later let you choose the ACP during install.

Michael S. Kaplan on 13 Feb 2011 4:04 PM:

Yes, I chose my language with care:

"And to add insult to serious injury, you can't ever change the default system locale, the "Language for non-Unicode programs" as it was later known."

The setting in question was halfway added to NT4 and fully added to Windows 2000.

Yuhong Bao on 5 Jun 2011 9:50 PM:

"You can display text using TextOutW/ExtTextOutW"

Or change the font charset to match the keyboard.


referenced by

2011/04/04 UXTheme in the non-Unicode world isn't "Double Secret ANSI"

2010/08/25 Yet another cost to not supporting Unicode?

2008/10/15 What is the freaking point of LOCALE_USE_CP_ACP?

2008/10/01 What do you get when you put a Hebrew on top of a Russian? (aka What lies beneath can bite you on the ass)

2007/05/07 The Unicode train? It left the station....

2007/03/25 Will the real Unicode character message please stand up?

2007/03/20 Double Secret ANSI, part 2 (the brokenest one yet, sorry 'bout that!)

go to newer or older post, or back to index or month or day