by Michael S. Kaplan, published on 2005/08/25 07:40 -07:00, original URI: http://blogs.msdn.com/michkap/archive/2005/08/25/456237.aspx
I have certainly done my share of pushing for Unicode controls in various programming languages on Windows. From the UniToolbox controls link on this very blog to the book I wrote for Visual Basic (see Chapter 6 online!) -- this is the one that Joel Spolsky said all of the very nice things about in this post and the audio interview it links to (in the interview I was an example, he was mainly talking about how Amazon ratings/comments can be particularly biased/skewed by folks with a "smear tactic" agenda, a point on which I agree with him -- but I usually just filter the anonymous comments to get a more accurate answer!).
(Joel, I'll cover my thoughts on a book in another post!)
Anyway, I am a huge fan of Unicode controls.
In prior versions of VB (<= 6.0) they were only half-Unicode, by which I mean they all had Unicode interfaces but for the most part were wrappers around non-Unicode intrinsic or common controls. Which means a lot of conversions back and forth (and back again in many cases on NT-based platforms since the underlying controls themselves are Unicode!). So you get all of the space and performance penalties of Unicode with none of the benefits (like the Shell Unicode interfaces in Windows 95!).
It was very exciting that in .NET all of the WinForms controls are 100% Unicode any time the OS could support it happening. Even on Win9x all of the owner draw controls still support Unicode, and some of the common controls. You can see some of this in the documentation, like in this topic:
However, certain controls do not support Unicode in Windows 98 and Windows Millennium Edition. These controls, all of which inherit from the common control, will process data with the Windows code pages, as ANSI. These controls are: TabControl, ListView, TreeView, DateTimePicker, MonthCalendar, TrackBar, ProgressBar, ImageList, ToolBar, and StatusBar. The result of this is that you cannot display Unicode data in these controls on the listed platforms. For example, you cannot display Japanese characters on an English Windows 98 system.
We'll ignore the technical mistakes here and the fact that it does not mention some of the intrinsic-based controls like the TextBox also have this problem (and especially the fact that some of the common controls actually do support Unicode on Win9x, and will work properly in WinForms!) and concentrate on the issue that there are a few controls which will not support Unicode on Win9x, even in WinForms.
It is easy to get worked up about this, but these days I do not. After all, the only time I ever run Windows 98 or Millenium these days is when I am looking at an MSLU bug, and it has been a long time since one of those has needed a look. And even if the controls fully supported Unicode, usually the fonts would not be there so all you would see is a bunch of square boxes a.k.a. NULL glyphs (��������) which is really not much better in terms of information than a bunch of question marks (????????).
For me it is enough that everything is Unicode whenever it can be. Thats cool.
Now the final frontier is C++ projects -- since so many people still don't create the projects as Unicode ones, and a lot of developers still write that TCHAR code even if they are only writing for NT-based platforms like Win2000 or XP or Server 2003 or Vista -- LPTSTRs and TCHARs, yuck!
In NLS we put our foot down in Windows Server 2003 -- no new NLS API functions will be written with ANSI counterparts. And we're continuing that in Vista. Not everyone has gotten the word on this yet, so we'll need to step up on the "internal evangelism" with other teams and groups. But it should be easier to suggest that people write less code, I think -- much easier than to suggest that people need to write twice as many functions and messages!
The old functions will still work, sure. But there is plenty of new functionality like FindNLSString and NormalizeString and lots more that I will be covering in future posts -- and it is Unicode only, like many of the new locales in Vista are.
So if you are writing C/C++ applications, you have to ask yourself if you really want half the world to have to speak fluent question mark to use the products you write?
This post brought to you by "ཀ" (U+0f40, a.k.a. TIBETAN LETTER KA)
(A letter that you are probably not looking at a NULL GLYPH for if you are running Vista Beta 1!)
# Wayne Steele on Thursday, August 25, 2005 11:33 AM:
# TUniverse on Thursday, August 25, 2005 11:41 AM:
# Peter Ibbotson on Thursday, August 25, 2005 11:46 AM:
# Michael S. Kaplan on Thursday, August 25, 2005 11:50 AM:
# Michael S. Kaplan on Thursday, August 25, 2005 11:54 AM:
# CornedBee on Thursday, August 25, 2005 2:07 PM:
# Michael S. Kaplan on Thursday, August 25, 2005 2:43 PM:
# Jonathan on Thursday, August 25, 2005 3:16 PM:
# Mihai on Thursday, August 25, 2005 3:23 PM:
# CN on Thursday, August 25, 2005 4:06 PM:
# Jerry Pisk on Thursday, August 25, 2005 4:18 PM:
# Michael S. Kaplan on Thursday, August 25, 2005 4:31 PM:
# Jerry Pisk on Thursday, August 25, 2005 6:23 PM:
# Michael S. Kaplan on Thursday, August 25, 2005 6:30 PM:
# Robert on Thursday, August 25, 2005 9:00 PM:
# Serge Wautier on Friday, August 26, 2005 4:35 AM:
referenced by
2008/06/09 But What of Michael?
2006/01/17 They don't make Null Glyphs like they used to!
2005/08/28 About [not] writing books