Converting a Project to Unicode: Part 7 (What does it mean to fit things to a 'T', anyway?)

by Michael S. Kaplan, published on 2007/01/03 03:01 -08:00, original URI: /web/20080229133725/http://blogs.msdn.com/michkap/archive/2007/01/03/1395788.aspx


Previous posts in this series (including today's!):

(If you are just tuning in and want to start now you can grab the current source from here -- no changes since it was posted the day before yesterday)

Like I said yesterday, if you have read Parts 2-5 then you know how we went from a purely ANSI application to a purely Unicode one.

The binary itself has been tested with the MSKLC update and it resolves the bug I talked about back in Part 0. And the Unicode Bootstrap EXE works for the scenarios in which it will be used.

Now for a moment I wanted to talk about the myth of applications compiled as both Unicode and ANSI. We say TCHAR but the truth is that most of the time the dev has just one in mind. For me it is Unicode (which leads to problems like the one Mihai pointed out here) and to be honest most developers think of it as ANSI, even when they talk about Unicode, which is why you get problems like those in the DrawThemeText function. Ignore the weird text for a moment:

DrawThemeText uses parameters similar to the Microsoft Win32 DrawText function, but with a few differences. One of the most notable is support for wide-character strings. Therefore, non-wide strings must be converted to wide strings, as in the following example.

You know, text handled by people who were not aware that DrawText has a Unicode version. And just look at the code sample:

INT cchText = GetWindowTextLength(_hwnd);
if (cchText > 0)
{
  TCHAR *pszText = new TCHAR[cchText+1];
  if (pszText)
  {
    if (GetWindowText(_hwnd, pszText, cchText+1))
    {
      int widelen = MultiByteToWideChar(CP_ACP, 0, pszText, cchText+1, NULL, 0);
      WCHAR *pszWideText = new WCHAR[widelen+1];
      MultiByteToWideChar(CP_ACP, 0, pszText, cchText, pszWideText, widelen);

      SetBkMode(hdcPaint, TRANSPARENT);
      DrawThemeText(_hTheme,
                    hdcPaint,
                    BP_PUSHBUTTON,
                    _iStateId,
                    pszWideText,
                    cchText,
                    DT_CENTER | DT_VCENTER | DT_SINGLELINE,
                    NULL,
                    &rcContent);

       delete [] pszWideText;
    }

    delete [] pszText;
  }
}

This is code that won't even compile if you try to compile it as UNICODE!

Clearly, there are times where even the people who are moving forward and only providing Unicode versions to their functions are not necessarily thinking of a TCHAR as a type that could be either a CHAR or a WCHAR.

And I am not casting stones here or anything (after all, I made the same kind of mistake in the other direction -- one I may never have noticed since I was only ever going to probably compile and run the code with UNICODE/_UNICODE (just as I suppose people are anticipating those samples will be written by people who don't).

It makes the whole "T" thing really a myth most of the time, you know? :-)

So I think we should go ahead and make sure it will compile both ways, and do the work in the makefile to make sure it happens. let's break the myth, at least for this particular sample at this particular moment....

One way that some Platform SDK samples do this (like the StrOut sample, for example) is in addition to the makefile, having a makefile.uni that looks something like this (this is the StrOut one):

#*************************************************************#
#**                                                         **#
#**                 Microsoft RPC Samples                   **#
#**                   strout Application                    **#
#**         Copyright(c) Microsoft Corp. 1992-1996          **#
#**                                                         **#
#** This is the makefile used when compiling for UNICODE.   **#
#** It sets the flags it needs, and then call the regular   **#
#** makefile.                                               **#
#** To compile for ANSI type nmake at the command line      **#
#*************************************************************#
# FILE : MAKEFILE.UNI

!include <ntwin32.mak>

#include support for unicode
cflags = $(cflags) -D_UNICODE -DUNICODE
midlflags = -D _UNICODE

#include library for CommandLineToArgvW function
conlibsdll = $(conlibsdll) shell32.lib

!include <makefile>

Well, no COM and we don't use CommandLineToArgvW, so we don't need exactly this. But it gives one example of how samples are doing this. We'll just go with it. :-)

The cynical side of me believes that if this does end up in the Platform SDK that this will work up until the next time it is updated for some other particular feature, since the whole "dual compiling system" doesn't exactly fit us to a 'T'.....

(other techniques here might include different config settings in the same makefile or environment variable dependencies, but I am aiming for the Platform SDK so doing it the way they seem to my work in my favor!)

But in any case, the next source code drop will include an updated makefile and a new makefile.uni and instructions about using them.

 

This post brought you to  (U+0f45, a.k.a. TIBETAN LETTER CA)


# Bart on Wednesday, January 03, 2007 7:48 AM:

Is supporting both unicode and ansi worth the trouble ?

# Michael S. Kaplan on Wednesday, January 03, 2007 10:54 AM:

Depends on who you mean to be asking for and in what context!

# Mike Dimmick on Thursday, January 04, 2007 7:52 AM:

Bart: that depends on whether you still need the resulting binary to work on Windows 9x. If you do, you have two options: you can either build for ANSI and use that build on both Win9x and NT-based systems, or you can build for Unicode and link to MSLU (unicows.lib). If you support both, you can change this decision relatively quickly.

You should be aware that running ANSI programs on NT-based systems causes all strings passed to ANSI Win32 APIs to be converted to Unicode at runtime, then those strings are passed to the equivalent Unicode API. On return, any modified strings have to be converted back. These conversions are pretty fast but still cost a little CPU time and memory. There are some scripts which do not have an ANSI encoding, and therefore ANSI programs will not work very well on Windows XP if one of these scripts is selected.

You could take the approach of building both an ANSI version and a Unicode version. This has been used by Windows Installer, which provides InstMsiA.exe and InstMsiW.exe installers (at least in version 2.0), while ATL.DLL (for ATL 3.0) comes in both versions, the installer installing the appropriate version depending on which OS it is installed on.

For this setup.exe, it's still possible that users of the sample will want to compile it to run on Windows 9x (even if only to provide a nice error message, rather than the unfriendly error that results from a missing export). However, in general I would expect to see Windows 9x support die out quite soon - aggregated web server statistics (e.g. at http://marketshare.hitslink.com/report.aspx?qprid=5) show Windows 98 with less than 2% usage share and declining. Obviously you should consider your own market to decide whether to support Windows 9x.


referenced by

2007/12/24 VS just got served!, aka The ??? Shift, aka 'Converting a project to Unicode???' No, it's 'Converting a project??? ToUnicode!!!'

2007/01/05 Converting a project to Unicode: Part 9 (The project's postpartum postmortem)

2007/01/04 Converting a project to Unicode: Part 8 (Fitting MSLU into the mix)

go to newer or older post, or back to index or month or day