When my decisions come back to haunt me (and/or others!)

by Michael S. Kaplan, published on 2012/08/01 07:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2012/08/01/10335466.aspx


A little over two half decades ago, I made a particular technical decision for a project I was working on at Microsoft.

I mention the reason over a half decade ago in a blog in this Blog o' mine.

And a little under two days ago, a blog by Eric Lawrence brought it all home to roost.

His blog on EricLaw's IEInterals titled Brain Dump: Shims, Detours, and other “magic” is a good read, and describes a fascinating bug involving IE10, a third party extension IE10 ships, and MSLU, the Microsoft Layer for Unicode.

You can read the full blog (it's a good read!) but I'll quote the relevant portion here:

I spent several hours pondering this question and aimlessly touring around in the debugger. I was whining about this scenario to a colleague, complaining about code so ancient that it was shipping with unicows.dll, when I realized that I’d never used this library myself, and in fact I’d never seen a toolbar use it before. When trying to explain what it did to the colleague, I decided that I’d probably stop hand-waving and pulled up unicows up on Wikipedia. And bam, there it was, plain as day: 

By adding the UNICOWS.LIB to the link command-line [ ... ] the linker will resolve referenced symbols with the one provided by UNICOWS.LIB instead. When a wide-character function is called for the first time at runtime, the function stub in UNICOWS.LIB first receives control and [ ... ] if the OS natively supports the W version (i.e. Windows NT/2000/XP/2003), then the function stub updates the in-memory import table so that future calls will directly invoke the native W version without any more overhead.

…and there’s the problem!

When IE first loads a toolbar, the shims run against the module and wrap all calls to CreateWindow with a call to the compatibility wrapper function. But when IE loaded this toolbar, it didn’t find any calls to CreateWindow, because those calls had been pointed at a function inside unicows.dll instead of at the original function in user32.dll. As a result, the compatibility shim wasn’t applied, and the function call failed.

Now, this wouldn’t have happened if unicows did its import-table fixup the “normal” way, using the GetProcAddress function. That's because the compatibility shims are applied to GetProcAddress as well, and the fixup would have been applied properly at the time that unicows did the update of the import table. However, for reasons lost to the mists of time, the implementers of unicows instead copied the source code of GetProcAddress from user32 into their own DLL, so the shims had no way to recognize it. While we could add a new shim to handle unicows.dll, the obscurity and low priority of this scenario mean that we instead decided to outreach to the vendor and request that they update their build process to remove the long-defunct support for Windows ‘9x.

Well, I'll object a little about the characterization that things that feel so recent to me are "lost to the mists of time". :-)

Though I won't complain too much, since the issue in question caused him to be randomized so obnoxiously!

The blog my mine that covers the issue is from point one of May of 2005's Why does MSLU wrap ________ ?:

1) There is, for example, the GetProcAddress function. It takes a string, but never a Unicode string, on NT or otherwise. So why would it need to be wrapped?
 
Well, it turns out that the GetMonitorInfo function, defined in multimon.h, is not just a simple prototype. There is a bunch of complex code in it that conditionally calls various APIs, including GetProcAddress, to get a function pointer to replace any call to GetMonitorInfo. Because of this, MSLU could not wrap the GetMonitorInfo function, because the wrapper would never be used. The only way to allow the to wrapper to work was to wrap GetProcAddress and look for where someone was trying to retrieve the address of GetMonitorInfoA or GetMonitorInfoW!

This was back in the heady days when I had the DaveC like power to have influence on pretty much any function in multiple versions of Windows.

Even if the versions were Windows 95, Windows 98, and Windows Me.

I suppose there is a small procedural problem with trusting a troubled perfectionist such as myself to act as sole architect/program manager, principal developer, and only tester on a project.

But my manager at the time had quite a knack for making me feel slightly foolish while asking questions that in retrospect seem quite reasonable like

Don't we need someone with PM experience here?

or

Should I really be in the only one in charge of testing  code I wrote myself?

while simultaneously making me feel like I could get the job done.

So perhaps I can be forgiven this particular sin.

Though really I think I owe Eric lunch one of these days to apologize.

Eric -- sorry about that! Call me after I get back from Brisbane in a couple of weeks! :-)


Yuhong Bao on 1 Aug 2012 10:10 AM:

So why couldn't MSLU call through to the OS GetProcAddress function?

Michael S. Kaplan on 1 Aug 2012 10:41 AM:

Because there was no way to hook the multimon functions without hooking GetProcAddress.

Yuhong Bao on 1 Aug 2012 12:21 PM:

Why did that hook make it impossible?

Michael S. Kaplan on 1 Aug 2012 1:36 PM:

If you look at the SDK header file for multimon, you'll understand. It contains complex delayload logic that unicows.lib could not change except by overriding GetProcAddress.

GregM on 1 Aug 2012 1:39 PM:

When I first read that first sentence, my mind silently inserted "and a" between "two" and "half", which made me wonder at how you were working at MS when you were 16.  :)

GregM on 1 Aug 2012 1:41 PM:

Yuhong, it says right in the quoted article (or rather it says that no one knows why):

However, ***for reasons lost to the mists of time***, the implementers of unicows instead copied the source code of GetProcAddress from user32 into their own DLL.

Yuhong Bao on 1 Aug 2012 2:14 PM:

I know. I am asking why the override make it impossible to call the *original OS* GetProcAddress function.

Michael S. Kaplan on 1 Aug 2012 2:20 PM:

Check out the CODE in the header file, you will quickly understand. I promise. :-)

GregM on 1 Aug 2012 7:27 PM:

Yuhong, where did you read that it was impossible for MSLU to call GetProcAddress?  All it says is that they didn't do it, and no one remembers why.

Michael S. Kaplan on 1 Aug 2012 9:33 PM:

Actually, Eric saw that MSLU was wrapping it.

Mike Dimmick on 2 Aug 2012 6:51 AM:

The question is why the multiple monitor support did the strange thing. The answer is because multiple-monitor support was added in Windows 98 and the MultiMon.h header was written to detect whether real support was available, and fall back to a basic implementation if not.

The implementation #defines GetMonitorInfo (for example) to xGetMonitorInfo, which is implemented in-line in the header. However, this is an ANSI/Unicode function due to the szDevice member of MONITORINFOEX. An ANSI build dynamically loads the underlying GetMonitorInfoA if it's available, while a Unicode build loads GetMonitorInfoW, which doesn't exist on Windows 98. In this case a Unicode program linked with unicows would not support multiple monitors (because the test is for the presence of that API) whereas the ANSI build would. The only way for MSLU to do the right thing is to intercept GetProcAddress.

I'd have to argue the case of whether this API really needed A/W variants since the szDevice field is the display driver name, not a user-generated name. Still, that's what was done and that's the compatibility issue.

Joshua on 2 Aug 2012 3:58 PM:

? When you hook a function you need to leave a way to call its base. Unicows.dll should have done it that way. It can still be fixed now since the only reason you care is for shims.

GregM on 2 Aug 2012 5:28 PM:

This is the section I was referring to:

"Now, this wouldn’t have happened if unicows did its import-table fixup the “normal” way, using the GetProcAddress function. That's because the compatibility shims are applied to GetProcAddress as well, and the fixup would have been applied properly at the time that unicows did the update of the import table. However, for reasons lost to the mists of time, the implementers of unicows instead copied the source code of GetProcAddress from user32 into their own DLL, so the shims had no way to recognize it. "

MSLU took over the GetProcAddress function, and then instead of calling the OS version of GetProcAddress when it determined that it wasn't a function that it cared about, it re-implemented GetProceAddress itself.  Yuhong is asking why it couldn't call the OS version instead of implementing it itself.  We don't know that it *couldn't*, we just know that it *didn't*.  It could be that the author simply *thought* that it couldn't do so.

Michael S. Kaplan on 2 Aug 2012 8:03 PM:

It was easier (well, better performance) to copy it then do a string compare on every call to check for a particular function...

EricLaw [ex-MSFT] on 3 Aug 2012 11:33 AM:

Hi, Michael-- Thanks for sharing your insights on the history of this code! (FWIW, IE10 doesn't ship the extension in question, we just found it during our compat-test pass.)

Alex Cohn on 3 Aug 2012 11:51 PM:

I wonder, how many corporate intranets will get stuck with an older version of IE (and all the security holes involved) because they use a component linked with unicows.

Michael S. Kaplan on 5 Aug 2012 8:52 PM:

Hey @Alex Cohn - as long as they ship the DLL, they'll be fine... :-)

Alex Cohn on 6 Aug 2012 1:30 PM:

No Michael, according to @EricLaw they're screwed because the new shims are necessary to make old components work in IE10, but they did not prepare shims for unicows

Michael S. Kaplan on 6 Aug 2012 2:09 PM:

They can simply remove unicows and be done with it.

Alex Cohn on 6 Aug 2012 6:32 PM:

Wait a sec... If a component was built to use unicows, can the  corporate admin simply remove the dll after install? I guess, no. They can  unicows with a dummy one, that does nothing. Or IE10 could seamlessly disable the dll.

Current solution reqires rebuild of the component. Though trivial, it may not be possible in some corporate scenarios. They will try the new IE, find that it does not work, and fall back to old version.

Michael S. Kaplan on 7 Aug 2012 12:33 AM:

Relink is your friend!

Michael S. Kaplan on 7 Aug 2012 12:37 AM:

A corporate admin relying on a component still supporting Win9x is in a world of hurt here.

Alex Cohn on 7 Aug 2012 12:00 PM:

" A corporate admin relying on a component still supporting Win9x is in a world of hurt" - unless it was written in COBOL.

John Kim on 25 Nov 2012 10:30 PM:

Hello,

I'm a web based solution provider and recently have faced very difficult situation because our product use unicows.dll.

Many of our customers cliam problems with IE10.

They want to use IE10 on Windwos7.

Of course we can rebuild our old product.

However, It is difficult to replace old products on their production systems with rebuilded one.

Sould we give up IE10?

I'm really tied up with this dilema.

So I'd like to get information about the official plan of Microsoft for this problem.

Any advise from any of you will be highly appreciated.

John Kim on 25 Nov 2012 10:35 PM:

I'm a web based solution provider and recently have faced very difficult situation because our product use unicows.dll.

Many of our customers cliam problems with IE10.

They want to use IE10 on Windwos7.

Of course we can rebuild our old product.

However, It is difficult to replace old products on their production systems with rebuilded one.

Sould we give up IE10?

I'm really tied up with this dilema.

So I'd like to get information about the official plan of Microsoft for this problem.

Any advise from any of you will be highly appreciated.

John M Kim on 25 Nov 2012 10:47 PM:

Hello,

I'm a web based solution provider and recently have faced very difficult situation because our product use unicows.dll.

Many of our customers cliam problems with IE10.

They want to use IE10 on Windwos7.

Of course we can rebuild our old product.

However, It is difficult to replace old products on their production systems with rebuilded one.

Sould we give up IE10?

I'm really tied up with this dilema.

So I'd like to get information about the official plan of Microsoft for this problem.

Any advise from any of you will be highly appreciated.

John M Kim on 25 Nov 2012 11:09 PM:

P.S.

IE10 with Window8 also makes same problem.

Yuhong Bao on 16 Apr 2013 4:27 PM:

BTW, this code is actually in UNICOWS.LIB (I disassembled it).

Trivia: A similar trick was used in Win9x to call undocumented functions in KERNEL32 that was exported only by ordinal, after a check was inserted into the KERNEL32 GetProcAddress against this during the Win95 betas.


Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

go to newer or older post, or back to index or month or day