by Michael S. Kaplan, published on 2007/01/01 06:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2007/01/01/1391798.aspx
Previous posts in this series (including today's!):
(If you are just tuning in and want to start now you can grab the current source from here.)
I am delaying the "road not traveled" post until tomorrow. Hang in there, it is coming!
Now we have a project that compiles and links and produces a setup.exe. Does that mean we're done?
Well, no. Because the project hasn't been tested yet. Like the First Tester's Axiom states, if you have not tested it, assume it is broken.
Of course testing a bootstrap EXE is a bit tougher than the average project; running the current project gives you a nice error:
You can look at the readme.htm file for information on how to plug in the various special properties that this (and most) SETUP.EXE files look for, settable via msistuff.exe. You could probably even puzzle out how msituff.exe works if you looked closely at the source of setup.exe with an eye to understanding the functionality (as opposed to trying to convert a project to Unicode!).
So the fact that we see this dialog means at the very least that some of this code works!
Ah, but when you click that OK button, what happens next suggests a bug is there:
Looks like something crashed. Since this does not happen with the non-Unicode version, common sense forces us to assume it is our bug. Let's take a look....
The call stack of the crash:
setup.exe!operator delete(void * pUserData=0xfdfdfdfd) Line 52 + 0x3 bytes C++
setup.exe!wWinMain(HINSTANCE__ * hInst=0x00400000, HINSTANCE__ * hPrevInst=0x00000000, wchar_t * lpszCmdLine=0x0002069c, int nCmdShow=0x00000001) Line 927 + 0x18 bytes C++
setup.exe!__tmainCRTStartup() Line 324 + 0x35 bytes C
setup.exe!wWinMainCRTStartup() Line 196 C
kernel32.dll!7c816fd7()
Obviously the problem would be in our wWinMain, not in operator delete. Let's take a look at the source code right around the crash line:
if (szInstallPath)
delete [] szInstallPath;
Hmmm.... the cleanup code is crashing try to delete a string that is allocated on line 539 of setup.cpp:
// canocialize the URL path
cchInstallPath = cchTempPath*2;
szInstallPath = new TCHAR[cchInstallPath];
Yet the actual error happened when "DATABASE" (actually ISETUPPROPNAME_DATABASE) was not found in setup.exe, hundreds of lines earlier (lines that would be run in RED):
// Determine if this is a patch or a normal install.
if (ERROR_OUTOFMEMORY == (uiRet = SetupLoadResourceString(hInst, ISETUPPROPNAME_DATABASE, &szMsiFile, dwMsiFileSize)))
{
ReportErrorOutOfMemory(hInst, DownloadUI.GetCurrentWindow(), szAppTitle);
goto CleanUp;
}
else if (ERROR_SUCCESS != uiRet)
{
// look for patch
if (ERROR_OUTOFMEMORY == (uiRet = SetupLoadResourceString(hInst, ISETUPPROPNAME_PATCH, &szMsiFile, dwMsiFileSize)))
{
ReportErrorOutOfMemory(hInst, DownloadUI.GetCurrentWindow(), szAppTitle);
goto CleanUp;
}
else if (ERROR_SUCCESS != uiRet)
{
PostResourceNotFoundError(hInst, DownloadUI.GetCurrentWindow(), szAppTitle, ISETUPPROPNAME_DATABASE);
goto CleanUp;
}
fPatch = true;
}
In other words, the code to allocate that buffer was never run!
We had better take a closer look at the definition of that variable, shouldn't we? :-)
The intent is straightforward enough:
TCHAR *szInstallPath = 0;
(though I probably would set pointers to NULL rather than 0, just as a matter of personal preference). In any case, clearly something is not working -- something is overwriting stack here, as all of these variables are on the stack.
Looking at the actual value at the time of the crash, it is 0xfdfdfdfd, which seems a bit too suspicious to be an actual pointer (especially since all of the surrounding string variables have the same value!). Looking at Funny Memory Values, this is:
Microsoft Visual C++ compiled code with memory leak detection turned on. Usually,
DEBUG_NEW
was defined. Memory with this tag signifies memory that is in "no-mans-land." These are bytes just before and just after an allocated block. They are used to detect array-out-of-bounds errors. This is great for detecting off-by-one errors.
So it does look like perhaps memory leak detection is writing a bit past one place or another. Let's again assume it is us and just debug.
Stepping through the code and looking at all the surrounding variables that are all on the stack (but whose memory they point to if allocated will be on the heap), szOperation has a bunch of memory set to 0xcdcdcdcd since there is no operation, and this is then deleted and set to 0xdddddddd and so on.before it is finally set to the stringt "DEFAULT." We then go through the same dance for szProductName which is eventually set to "the product" and szTitle which is set to "Please wait while '%s' is downloaded...".
And here is where we find the problem. It is in this line of code:
StringCchPrintf(szBanner, sizeof(szBanner), szText, szProductName);
In most of the project they are properly using expressions like sizeof(szBanner)/sizeof(szBanner[0]) or at least sizeof(szBanner)/sizeof(TCHAR) for these situations, but in this case the code wasn't. And yes, we have a cb vs. cch bug. Looking in the project there are 11 other occurrences with the StringCchPrintf function, so let's fix all of the ones that need it (about seven of them are using sizeof() incorrectly this way).
Once we do this, the crash goes away.
Root Cause Analysis of the bug: The introduction of the StringCchPrintf and related functions was after that of the bootstrap exe, so it is likely fair to say that this is a bug in the bootstrap sample caused by dev error in the port to the safe string functions and exposed by our little conversion project here. :-)
Looking at the whole project, there are 38 occurrences of safe string handling functions starting with StringCch* and none starting with StringCb*, though all of the others are StringCchCopy and StringCchCat calls, and none have similar errors. It is worth looking at these cases, under the "where there is one problem there may well be others" theory that works well in so many situations.
As this problem (which I knew about) and also the problem that regular reader Mihai pointed out the other day (which I didn't) indicate, it is obviously important to test prior to being sure that everything is correct....
Plus we'll talk about a few other things in upcoming posts.
Stay tuned for more on the alternate way parts 2, 3, and 4 could have been done, tomorrow!
This post brought to you by ដ (U+178a, a.k.a. KHMER LETTER DA)
# Mihai on 1 Jan 2007 3:43 PM:
To make your life easier, it is very convenient to define something like this:
#define COUNT(a) (sizeof(a)/sizeof(a[0]))
This is handy, but does not work well on pointers to characters, only on arrays.
You can define something that works on both:
#define CHRCOUNT(a) (sizeof(a)/sizeof(TCHAR))
A very nice addition to VS2005 is _countof, which will "explode" if you use it on pointers.
# Michael S. Kaplan on 1 Jan 2007 4:04 PM:
Yep, I usually have an ARRAYSIZE() that does exactly this (kbdmsi.dll and kbdutool.exe both define it and use it, for example).
Just trying to minimize stylistic changes to the sample project. :-)
# Mihai on 1 Jan 2007 8:22 PM:
The note was for other readers, not for you :-)
For my version of this project I did not switch to that either, since I have also tried to minimize stylistic changes.
Except for the way I treated MultiByteToWideChar (and, as a result, no crash :-) I am 100% in sync with you.
# Mihai on 1 Jan 2007 8:26 PM:
Additional note:
_countof "exploding" on pointers is a *good thing*
If you get a pointer to a bunch of chars, then you should also receive the size of the buffer (usually in characters).
Then you should use that, not ARRAYSIZE, DIM, _countof or anything else, because there is no way to determine the size of the buffer.
# Erzengel on 3 Jan 2007 12:28 PM:
Using C++ I prefer:
//Use this for parameters:
template<typename T, int N>
int ArraySize(T arr[N])
{
return N;
}
It will give a compile time error if you try to use it on a pointer, and doesn't make your code compiler dependant (I didn't even know of the VS2k5's _countof, though. Thanks. I've been using this since VS.net, don't know if 6's borked templates could use it).
The only issue is that you can't use the result in a case statement while you can with ARRAYSIZE. So if you have compile-time asserts or whatnot, you have to use ARRAYSIZE.
referenced by
2007/01/05 Converting a project to Unicode: Part 9 (The project's postpartum postmortem)
2007/01/04 Converting a project to Unicode: Part 8 (Fitting MSLU into the mix)
2007/01/03 Converting a Project to Unicode: Part 7 (What does it mean to fit things to a 'T', anyway?)
2007/01/02 Converting a Project to Unicode: Part 6 (Upon the road not traveled)