Yet another cost to not supporting Unicode?

by Michael S. Kaplan, published on 2010/08/25 07:01 -04:00, original URI:

It is very hard to type when one has a sprained shoulder. Thank goodness for Dragon Dictate! I'm just saying....

Over in the Suggestion Box, Alex asked:

Hi, Michael.

I've googled through your blog, but haven't found an answer to the following question. This question is very popular (at least in Russia)  and I was surprised that you didn't covered it yet. So, may be you can tell us a story behing it. This issue is about clipboard, text and non-unicode application.

Take a old non-Unicode application (like Notepad from Win9x) and run it on new Windows (like XP), which have 2 input languages installed (like English and Russian, for example). Suppose that "Language for non-Unicode application" setting is set to Russian.

In Win9x you can copy text via clipboard from any application to any other application without problem. Sure, old apps don't bother to set CF_LOCALE along with CF_TEXT, but things worked very well then, since the same code page was used by all apps (okay: almost all).

Now, take a modern "Unicode" OS, like XP. You take your old app, which served you many years, copy text to clipboard, paste it in other application (like modern Notepad) and... whoa: you get question marks or gibberish. What's wrong? Heck, you forgot to _switch keyboard input to Russian_. Once you do that - everything start acting smooth again.


Top row: left - notepad from Win98, right - notepad from XP. Bottom row: left: notepad from XP, right - notepad from Win98. Current input language is set to "English (United states)" (like we forgot to switch it to Russian). Red lines indicate copy/paste operations via clipboard. (I took Microsoft's application to indicate that this is not a bug in particular 3rd party application).

The problem (as I see it) is getting Unicode text from ANSI-text. Why Windows uses keyboard input method for that, rather than using "Language for non-Unicode applications"?

This is terrible break in user experience. Most people thinks that this is a bug. I hear this complain very often (heck, I hear cursing Microsoft in almost all cases of mentioning this). It's especially painfull to explain that your application, written a many years ago, has nothing to do with this change.

Can you shred a bit of the light on the background of this issue, please? Why Microsoft decided to (okay, not to break, but to) compicate lives of zillions of existing applications? If Microsoft cares a lot about backward compatibility - why was such decision made?

This sounds very familiar.

I can't quite put my finger on it.

Oh yeah, I was thinking of Double Secret ANSI, part 1 (Somewhere between ANSI and Unicode) and Double Secret ANSI, part 2 (the brokenest one yet, sorry 'bout that!).

The Win9x version of this very same feature that actually allowed a tiny bit of cross-codepage stuff to work if people tried hard enough (like Adobe did) was kind of incomplete.

The NT-based version of it fills in the holes, which are apparently what some people were relying on a little bit?

Technically the NT-based version of this feature has always been broken though. That makes fixing it kind of a tough sell, as opposed to just supporting Unicode in the apps.

Historically, we seem to be in the habit of breaking people who aren't using Unicode. Not because of an attempt to sabotage, but just because Unicode support gets a lot better coverage....

yuhong2 on 26 Aug 2010 1:43 PM:

BTW, anyone here remember Apple's WorldScript?

go to newer or older post, or back to index or month or day