The Unicode train? It left the station....

by Michael S. Kaplan, published on 2007/05/07 11:45 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2007/05/07/2464778.aspx


Heiko Braeske asks in the microsoft.public.win32.programmer.international newsgroup:

I have a Non-Unicode-Application and I want to show text in the common controls (e.g. CEdit) with a specific character set. I thought I could manage this by setting the font, which was created from a LOGFONT with the lfCharSet member set to the desired value.

But in fact this works only if I use the old common controls version 5. If I use a manifest to use the common controls version 6, obviously the system locale is used to show the text and the lfCharSet of the font has no effect.

Is there a way to influence the new common controls which character set they use? But it should be a per user setting!

Thanks for your help.

Heiko

A little sad when shades of "Double Secret ANSI" (ref: part 1 and part 2) seem to come and go, and I sympathize with the nature of something that for intents and purposes appears to be a regression.

But the original functionality never seems to have been documented in the older control, which would tend to make this a bit less interesting (especially considering how long it and the v.6 controls have been there).

Maybe it is just another proof that if you're not using Unicode, you're missing out, that the Unicode train has been leaving the station for long enough that it has in fact left?

If the folks on the Shell team asked me which was more important: improving Unicode support, or fixing this particular "ANSI" regression, which do you think I would choose?

In the end, I guess I'd just rather be a defender of making what works behave better than a defender of making what is broken be slightly less broken. :-)

Would you choose the same? Or in your opinion is the ANSI legacy more important here?

 

This post brought to you by U (U+0055, a.k.a. LATIN CAPITAL LETTER U)


# Aaron Ballman on 7 May 2007 1:32 PM:

I'm in the unique position of needing to write one source base which supports Windows 98 thru Vista, and be "unicode savvy" even on systems without support.  My source code it littered with A versions and W versions of calls.  So ANSI support is important to me -- but even *I* agree that time is better spent improving for the future instead of patching up the past.

# Ben Bryant on 7 May 2007 3:36 PM:

Tough call, but I say move on. if I understand right, using the lfCharset of the font of a common edit control to switch from the CP_ACP is almost a hack, at least an unintended, undocumented and half-hearted feature used at your own risk. Using the font to change the charset of a control seems like such a bad architectural choice that I would have never thought to look for it.

Better to use a Unicode edit control that can work in your non-Unicode build (3rd party if necessary -- I sell one at firstobject.com) and switch to your charset of choice where necessary.

Backwards compatibility is of utmost importance, but every case must be weighed. If it is a bug or unintended behavior that many apps have grown to depend on, then you have to maintain it. But if it wasn't a promoted usage, you might be justified in moving on.

# Mike Dimmick on 7 May 2007 7:49 PM:

Give up on ANSI. Web usage share statistics at http://marketshare.hitslink.com/report.aspx?qprid=2 suggest that Windows 98 is now used by under 1.5% of users. Even if you add Windows ME, the non-Unicode operating systems are still under 2%. Windows Vista has more users, less than four months after release.

It would be more rational to pursue Mac users than to persist with compatibility with Windows 9X.

# Dean Harding on 7 May 2007 8:11 PM:

I say move on as well. If the feature is really that important, you can still use version 5.

# Heiko Braeske on 8 May 2007 4:55 AM:

If "to be state of the art" would be the only determining factor the descision would be clear to move to Unicode. We don't care about Windows 98 or anything below XP or 2003 server. It is not relevant for our business. But a limiting factor of our server application is memory. And as long as you can't convince me that wchar doesn't need more memory than a char it will remain an argument against Unicode.

From MS point of view it is clear to simply suggest Unicode. Because if our application needs 5GB RAM instead of 3GB because of Unicode our customers will be forced to buy Enterprise server licenses instead of Standard. Hardware sellers will be also happy. Increase of costs without an increase of value for most of European/American customers! This is Unicode!

Now coming to the "undocumented" feature: The documentation of LOGFONT states for lfCharSet: "This parameter is important in the font mapping process. To ensure consistent results, specify a specific character set." And this sentence is true for v.5 controls but not for v.6 controls.

Yes, I agree, it is a hack and a bad architectural choice. I would prefer to choose the relevant codepage application wide via some API call like e.g. setlocale. But Windows allows only to change this setting manually, system wide and force you to reboot. And this is from my point of view "bad architecture".

But finally I want to make clear that I don't lock up totally against Unicode. Globalization is not an utopia and we have to respect our future customers in Asia were Unicode of course has a huge value added. But the world is not that simple to just answer "Move on..."

# mpz on 8 May 2007 8:08 AM:

Fvck "ANSI". Fvck all the developers who keep artificially lengthening the lifespan of their dying breed of codebases that cause endless problems to people who happen to use a single non-"ANSI" character in their filenames.

In the year 2007, Thunderbird (the mail client) errors out when trying to attach a file that has a non-"ANSI" character even if it's only in the path.

# Michael S. Kaplan on 8 May 2007 8:18 AM:

Hello Heiko --

As I pointed out in the newsgroups, your APP is using *more* memory if you do not use Unicode, in the long run -- because behind the scenes the OS has to allocate memory to convert your stuff to Unicode anyway. The illusion that you have "saved memory" is just that -- an illusion. So your app is jut slowly fragmenting your process heap with this constant allocation/destruction of memory, hurting performance, and stunting functionality.

But if one wants to move forward (and use the v.6 controls) then one must mobve forward (and use Unicode) for your scenario....

# Dean Harding on 8 May 2007 8:03 PM:

I also find it hard to believe that an app (even a "server app") has 2GB of characters strings in memory at any one time... If you really have that much data in memory, why would you NOT be using a database?

# Heiko Braeske on 9 May 2007 4:57 AM:

Michael, is there any documented benchmark that proofs your statement? I was just taught not to rely on undocumented features. ;-)

# Michael S. Kaplan on 9 May 2007 9:03 AM:

:-)

Given that the entire underlying architecture of the OS has been documented as being Unicode, the conversion requirement is obvious (as is the results of srings off the default system or indeed any code page).

The fact that allocation of the memory used to hold the results of the conversion is on the default process heap is an implementation detail which is not documented and therefore technically *could* change, if someone felt like updating the code in thousands of functions, but given the effort involved this seems like a fairly safe thing to not worry about....

The fact that calling function A (whose job is to allocate, convert the parameters to Unicode, call function W, if needed convert back out params, and then free the memory) is more expensive than calling function W directly is basic computer science. :-)

John Daintree on 16 Jan 2008 10:24 AM:

OK fine, I understand, but where is this documented? I've just struggled to figure out why an application is broken. No mention of this change in the docs for LOGFONT, or int the discussion of manifest files. *That* is the irritation, not in the change itself.

Michael S. Kaplan on 16 Jan 2008 10:54 AM:

A lot of me blogging is specifically because of wrinkles and issues and bugs and such that are not yet in documentation, and me being impatient about the lack.

So it ends up about me being irritated and doing something about it! :-)

Yuhong Bao on 9 Oct 2008 2:09 AM:

Heiko: UTF-8 would be a good compromise in this case, though you will need to manually convert UTF-8 to UTF-16 when calling Windows APIs.


referenced by

2012/03/26 The Unicode train left the station YEARS ago, in fact! (2012 edition)

2007/10/18 Trying to get people to use Unicode? Lock and load, baby!

go to newer or older post, or back to index or month or day