Putting the *backward* in backward compatibility

by Michael S. Kaplan, published on 2006/11/20 22:31 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/11/20/1112734.aspx


There have been many times that I have championed backward compatibility. Or maybe it's backwards compatibility? Anyone from Language Log want to chime in on which one is right? :-)

Anyway, there have been many times I have championed backward[s] compatibility in the past.

But there are times when it is probably a bad idea. When it is, to put it simply, quite backward.

A good example of this?

Well, think about the recent post Sometimes in the future 'ANSI' is really going to be unsupported!, and what it refers to as the second problem:

Ok, this leads us to the next problem, one that in my opinion is caused by the .NET Framework. For the sake of backward compatibility with VB4, VB5, and VB6, all P-Invoke calls that do not have charset information attached default to use the "A" version of functions. This means that even though the code is running in .NET where all the strings are Unicode on Vista where the registry and everything else is Unicode that everything is being dealt with as if it were a non-Unicode string, and the non-Unicode version of functions is being called.

Remember when I wrote about how The Unicode train is leaving the station and was quite clear that there would no longer be non-Unicode function calls added to the NLS API? And how we'd be recommending to other teams that they do the same, either the way we did with FindNLSString (not even decorating the name with a "W") or the way the Shell team did with StrCmpLogicalW (a "W" decoration), no "A" version is being provided? 

So, the folks who created Visual Studio and .NET even its very first version were eager to get out of the Win9x version. After repeated problems came up with trying to get the Shell to work on Win9x, they dropped support for running VS there. They definitely did not actively support the work of Microsoft Layer for Unicode, with both MFC and ATL not building special versions that used it or allowing the posting of such special versions. Even though unicows.dll shipped with the very first version of VS.Net, there was no option to build MSLU projects. Clearly everyone was distancing themselves from Win9x, and I could easily come up with a dozen additional little issues that would indicate how everyone was moving on.

So why use the CharSet.Ansi default in pinvoke? Especially when going forward there will be fewer non-Unicode functions?

Well, sort of backward compatibility, I guess.

Though to be honest changing the default to Unicode, or even more sensibly changing the default to CharSet.Auto, would keep the non-Unicode default from being injected into the world of Unicode operating systems.

How many versions of .NET will it take before this seems lame to everyone (especially if using pinvoke in managed C++ projects, where the project default is now UNICODE/_UNICODE, gives similar results? Not sure on this one, the doc are anything but clear....)

Now that is backwards! Or maybe backward. Whichever.

 

This post brought to you by  (U+17c4, a.k.a. KHMER VOWEL SIGN OO)


# orcmid on 22 Nov 2006 11:22 AM:

Well, the default Unicode is actually a small mess for beginners using VC++ 2005 Express Edition because string literals (the undecorated "...") and the char literals are still A-style.

But I think the bigger deal is code-page dependency.  Until the platform supports Unicode top to bottom, including console sessions and keyboard mappings, it is not so easy for the development tools to impose Unicode by fiat.

And there is a lot of single-byte code running around.  I just wrote some more in native Win32 ;).

- Dennis

# Michael S. Kaplan on 22 Nov 2006 11:50 AM:

Well, that has more to do with rules in the C/C++ standards, right? :-)

And of course the world of VB.NET and C# have no [good] reason for an ANSI default....

# Andrew Cook on 24 Mar 2008 4:40 PM:

Hmm...

"Character Sets. You can specify in charsetmodifier how Visual Basic should marshal strings when it calls the external procedure. The Ansi modifier directs Visual Basic to marshal all strings to ANSI values, and the Unicode modifier directs it to marshal all strings to Unicode values. The Auto modifier directs Visual Basic to marshal strings according to .NET Framework rules based on the external reference name, or aliasname if specified. The default value is Ansi.

"charsetmodifier also specifies how Visual Basic should look up the external procedure within its external file. Ansi and Unicode both direct Visual Basic to look it up without modifying its name during the search. Auto directs Visual Basic to determine the base character set of the run-time platform and possibly modify the external procedure name, as follows:

   *On an ANSI platform, such as Windows 95, Windows 98, or Windows Millennium Edition, first look up the external procedure with no name modification. If that fails, append "A" to the end of the external procedure name and look it up again.

   *On a Unicode platform, such as Windows NT, Windows 2000, or Windows XP, first look up the external procedure with no name modification. If that fails, append "W" to the end of the external procedure name and look it up again."

While the default value is ANSI, programmers need to explicitly state that the procedure name ends with "A" or else the name lookup will fail and thus an exception is thrown. The easiest way to fix that is to simply include "Auto", which on NT platforms makes things use Unicode goodness.

# Michael S. Kaplan on 24 Mar 2008 4:55 PM:

Or just stick to NT-based platforms and assume the "W" semantics throughout, instead. :-)


Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

referenced by

2008/03/24 Unicode not being the default is slower and leads to bugs; maybe it ought to change?

go to newer or older post, or back to index or month or day