My kingdom for some Unicode controls

by Michael S. Kaplan, published on 2005/08/25 07:40 -07:00, original URI: http://blogs.msdn.com/michkap/archive/2005/08/25/456237.aspx


I have certainly done my share of pushing for Unicode controls in various programming languages on Windows. From the UniToolbox controls link on this very blog to the book I wrote for Visual Basic (see Chapter 6 online!) -- this is the one that Joel Spolsky said all of the very nice things about in this post and the audio interview it links to (in the interview I was an example, he was mainly talking about how Amazon ratings/comments can be particularly biased/skewed by folks with a "smear tactic" agenda, a point on which I agree with him -- but I usually just filter the anonymous comments to get a more accurate answer!).

(Joel, I'll cover my thoughts on a book in another post!)

Anyway, I am a huge fan of Unicode controls.

In prior versions of VB (<= 6.0) they were only half-Unicode, by which I mean they all had Unicode interfaces but for the most part were wrappers around non-Unicode intrinsic or common controls. Which means a lot of conversions back and forth (and back again in many cases on NT-based platforms since the underlying controls themselves are Unicode!). So you get all of the space and performance penalties of Unicode with none of the benefits (like the Shell Unicode interfaces in Windows 95!).

It was very exciting that in .NET all of the WinForms controls are 100% Unicode any time the OS could support it happening. Even on Win9x all of the owner draw controls still support Unicode, and some of the common controls. You can see some of this in the documentation, like in this topic:

However, certain controls do not support Unicode in Windows 98 and Windows Millennium Edition. These controls, all of which inherit from the common control, will process data with the Windows code pages, as ANSI. These controls are: TabControl, ListView, TreeView, DateTimePicker, MonthCalendar, TrackBar, ProgressBar, ImageList, ToolBar, and StatusBar. The result of this is that you cannot display Unicode data in these controls on the listed platforms. For example, you cannot display Japanese characters on an English Windows 98 system.

We'll ignore the technical mistakes here and the fact that it does not mention some of the intrinsic-based controls like the TextBox also have this problem (and especially the fact that some of the common controls actually do support Unicode on Win9x, and will work properly in WinForms!) and concentrate on the issue that there are a few controls which will not support Unicode on Win9x, even in WinForms.

It is easy to get worked up about this, but these days I do not. After all, the only time I ever run Windows 98 or Millenium these days is when I am looking at an MSLU bug, and it has been a long time since one of those has needed a look. And even if the controls fully supported Unicode, usually the fonts would not be there so all you would see is a bunch of square boxes a.k.a. NULL glyphs (��������) which is really not much better in terms of information than a bunch of question marks (????????).

For me it is enough that everything is Unicode whenever it can be. Thats cool.

Now the final frontier is C++ projects -- since so many people still don't create the projects as Unicode ones, and a lot of developers still write that TCHAR code even if they are only writing for NT-based platforms like Win2000 or XP or Server 2003 or Vista -- LPTSTRs and TCHARs, yuck!

In NLS we put our foot down in Windows Server 2003 -- no new NLS API functions will be written with ANSI counterparts. And we're continuing that in Vista. Not everyone has gotten the word on this yet, so we'll need to step up on the "internal evangelism" with other teams and groups. But it should be easier to suggest that people write less code, I think -- much easier than to suggest that people need to write twice as many functions and messages!

The old functions will still work, sure. But there is plenty of new functionality like FindNLSString and NormalizeString and lots more that I will be covering in future posts -- and it is Unicode only, like many of the new locales in Vista are.

So if you are writing C/C++ applications, you have to ask yourself if you really want half the world to have to speak fluent question mark to use the products you write?

 

This post brought to you by "" (U+0f40, a.k.a. TIBETAN LETTER KA)
(A letter that you are probably not looking at a NULL GLYPH for if you are running Vista Beta 1!)


# Wayne Steele on Thursday, August 25, 2005 11:33 AM:

I'm running XP sp2, and I see the glyph just fine (Kind of like a three-legged PI, with each leg longer than the one to its left).

# TUniverse on Thursday, August 25, 2005 11:41 AM:

Hey there --
Just found your blog the other day and have been enjoying the new knowledge.

Questions for you: Is your bias against TCHAR and LPTSTRs just a bias against having to support 2 APIs? Or is there more to it than that? In my code, should I not be using the TCHAR and _T() and the likes, and what should I be using instead?

# Peter Ibbotson on Thursday, August 25, 2005 11:46 AM:

That letter looks ok to me even in WinXP or have I missed something?

# Michael S. Kaplan on Thursday, August 25, 2005 11:50 AM:

Wayne and Peter -- you must have a font with Tibetan in it on your machine? :-)

# Michael S. Kaplan on Thursday, August 25, 2005 11:54 AM:

Hey TUniverse --

Welcome to the blog!

My bias is (in part) that it means you have to support two functions, but also because you deal with larger number of languages that the function cannot support, and by doing that extra work you enable developers to write code that will not support those languages, either....

If you can move away from all that and write purely Unicode applications with L"", L'', and LPWSTR/WCHAR then I would encourage people to make the jump. Win9x is the past, and Unicode is definitely the way going forward.

# CornedBee on Thursday, August 25, 2005 2:07 PM:

My dear Linux displays the glyph without problems.

I agree, the time of ANSI is past. But I still favour wchar_t and std::wstring :)

# Michael S. Kaplan on Thursday, August 25, 2005 2:43 PM:

I am a huge fan of wchar_t and std::wstring, too. Any type, as long as its Unicode....

# Jonathan on Thursday, August 25, 2005 3:16 PM:

Perhaps the next version of Visual Studio should create Unicode projects by default?

# Mihai on Thursday, August 25, 2005 3:23 PM:

From the fonts I have on my system, U+0f40 seems to be present in:
- Arial Unicode MS Regular
- NSimSun-18030 regular
- SimSun-18030 regular
The 18030 are part of the GB-18030 support pack, and not many systems have this installed.
But Arial Unicode MS (from Office, I thing) is quite a popular one :-)

# CN on Thursday, August 25, 2005 4:06 PM:

Doesn't render as null, it looks like a character. If it's correct, well, I don't know that :-) (WinXP)

# Jerry Pisk on Thursday, August 25, 2005 4:18 PM:

I have to disagree that Win9x is the past. A lot of customers still use it (25% of our users still use Win9x) and unless you're going to ignore those you have to stick to writing ANSI versions of your code as well. And the easy way is to use TCHAR unless you want to maintain two versions of your code.

# Michael S. Kaplan on Thursday, August 25, 2005 4:31 PM:

For those still using Win9x, there is MSLU -- and since Win9x does not support new API functions, there is no purpose to new ANSI API functions in the future.

# Jerry Pisk on Thursday, August 25, 2005 6:23 PM:

MSLU is a very nice thing but trying to convince a large corporation to install it on all their desktops, the ones they are still running on Win95, is simply impossible. Especially since there are a lot of competitors that will be more than happy to stitch together a VB program that will work in their environment. And this is not even anything extreme, we also support a 16-bit versions of some of our products because we have customers that still run Win 3.1 :) Part of the problem is that those people are still supported, by companies like ours, so they have absolutely no need to upgrade.

# Michael S. Kaplan on Thursday, August 25, 2005 6:30 PM:

Well Jerry, the old functions that work on both Win95 and Vista are still there. But new functions would not help Win95 anyway, so there is no reason to create them just to encourage people to write apps that won't support languages in Vista....

# Robert on Thursday, August 25, 2005 9:00 PM:

Unicode controls should be used wherever possible. The time of ANSI controls has passed. Personally, I'd like to see Outlook Express use Unicode controls -- currently, no matter which default system locale I have selected, the message list pane in Outlook Express shows many of the mails I receive with garbled subject lines and names.

# Serge Wautier on Friday, August 26, 2005 4:35 AM:

No earlier than a couple of months ago, I started a small contracting project. The client wanted an app that was world-ready. "I'm glad you ask", I said, 'because it's the only way I can work :-)". More seriously, when I told him it was going to be a Unicode app : "I guess you don't care if it doesn't work on 9x ?".
He said "Hhhmmmm... That might be a problem to our clients in the emerging markets." I don't quite agree: The people in emerging markets who still use 9x for professional use are most likely not his clients.

But it's OK to me, 'coz MSLU is my friend ! And even if listview controls were used quite extensively, I'm not worried : I believe these users don't go 'cross-codepage'.

referenced by

2008/06/09 But What of Michael?

2006/01/17 They don't make Null Glyphs like they used to!

2005/08/28 About [not] writing books

go to newer or older post, or back to index or month or day