Dealing with backcompat across many versions

by Michael S. Kaplan, published on 2005/05/11 22:40 -07:00, original URI: http://blogs.msdn.com/michkap/archive/2005/05/11/416717.aspx


Larry Osterman posted earlier today about Turning the blog around - End of Life issues and I thought about the issue. I think I have talked about the backcompat issues enough in the past that it is pretty clear where I stand on the issue of when we say that some particular API set or feature or function or behavior is done.

I don't think we really can. At all.

I feel that way because in my opinion the one customer we cannot ever let down is the one who figured out how to leverage our API sets, use our features, call our functions,  and rely on our behaviors. The one who wrote an application that used all of these things. Because that application will have customers and they will be relying on everything working in the same way. When there are changes there must be clear and unambiguous reasons for why.

So should we fix the sort order if for some reason it thought that Z came before B in English?

Yes, because English speaking users of today would have noticed this and considered it a bug.

But should we change the underlying nature of the CompareString API to start returning an Enum instead of an int because it makes debugging more convenient?

Hell no, because debugging would suddenly become less important as we just caused every application in the universe to start failing to compile -- meaning they would never get to the place easier debugging would be a useful feature.

Now obviously those two examples are extremes, and the issues about which people fight are not always this obvious. But as a guideline it is where I would draw that line, and the standard I would hold us to. And where I do hold us to, whenever it is either up to me or when I have a voice.

Let's look at the summary:

A good example of that third category (which contains many items!) is if we change the behavior of a function that takes a structure. Like if we changed GetNLSVersion in some way, we'd do in the typical Win32 way -- we would change the NLSVERSIONINFO's dwNLSVersionInfoSize member to a new size to fit whatever the new features are. And people who call it the old way will get the old behavior.

One major agenda item we have had that does break compatibility and which we live with the break is for security. For obvious reasons.

But (as a PM colleague of mine likes to say) why don't we take a step back for a moment?

Now there are other products which obviously have to deal with the same problem.

Do any of them deal with the juggling of "compatibility versus correctness" (to use a simplifying statement of the clash) in a different way?

And if so, can we learn from them?

What about the C/C++ compiler? Although not a part of the operating system, it is used to build the operating system, so clearly its methods of dealing with the problem affect the OS. How do they solve this problem?

Well, they tey to not change behavior. And as they struggle to become more and more comformant to the official C++ standard (breaking more and more developers), they add flags to make sure that people who need the old behavior have a way to get to it. And there are some features (like the format of the .PDB files and the syntax of C++ function name decoration) that changes every version and no one should ever rely on for that reason.

Which in a way indirectly answers how they handle the situation, both things that are the same and things that are different:

I'd say that in Windows as well as in NLS we clearly have the first two bit but they handle the third a bit differently a lot of the time. There are many cases where a change in the way flags work might happen, but a flag was added to get the old behavior for people who really need it. But in a lot of other cases, the new flags are nor the new behavior, like adding new warnings for new platforms.

It is probably my "external" view but I think with their important "agenda" items for change have caused more issues to come up, or maybe just ones that are harder to justify (obviously everyone has to close their mouths and take it on security but the folks who care about backward compatibility vs. C++ standards conformance are two different groups. Therefore they are more likely to have customers who are not even willing to justify the premise, whereas noi one likes to be on the side of arguing for less secure products).

But in any case, there is really not much that they do differently that we could learn from in terms of techniques for affecting change. So we'll set them to the side for the moment.

Let's look at other options.

Products like Office tend to handle things just like Windows does, but moreso -- they might even still support the Excel 4.0 XLM macro stuff or .XLA dlls, for people who have Office applications that use such features. So I don't kno if there is much we can learn there.

But I will cover this more another day.

Then there are products like SQL Server, which have a very different model than any of these people use.

They do have those same three categories, but they handle that third one very differently.

The goal in SQL Server is to make an upgrade seem like nothing has happened at all, with the same behavior everywhere. And then people can jump into features as they need them.

It is why there are so many compatibility collations that refer to SQL Server behavior as far back as 4.21 or further. And why there are compatibility constants that can take DMO and Server behavior back as far as SQL Server 6.0 in the Yukon release. And why there are so many flags with which you can start the server that give it specific old behaviors that large adoptions have been found to rely on.  Some of these features are avalab;le only to the server's administrator, but other features are available to everyone

And I wonder at that -- we don't really have such a model that is consistently used throughout Windows.

For example, the assumptions in GDI and in typography are that even a one pixel difference with a font can cause issues for an application, and there is no global setting to say "it is okay to change anything you need to" other than just using a whole new font. And even in NLS where we have added the notion of collation versions, there is no capability present there to use a prior version's collation.

Windows does have group policy settings to (for example) revert the way the way Shell treats digits to no longer make it like numbers (which is a lot like those various SQL Server compatibility flags) but the database administrator is not the same kind of person as the sysadmin of a large organization handling a bunch of Windows installations, so the audiences are very different.

Is there something that can be done at the install level, the process level, or maybe at the function call level to assist developers here? A way to either opt in to new behavior that we provide as a non-default option or opt out of new default behavior that they do not like? Something that we can glean from the behavior of one of these other products?

I don't have answers here (at this point), but I think I have a handle on a lot of the issues and (due to my past life as one of those developers who relied on the stability!) an over developed sense of how evil we would be if we violated any of the reasonable assumptions that developers would make about our API set's behavior. So I guess my answer to Larry's question is mostly the same:

We cannot completely drop support for anything at this point, because in my opinion the one customer we cannot ever let down is the one who figured out how to leverage our API sets, use our features, call our functions,  and rely on our behaviors. The one who wrote an application that used all of these things. Because that application will have customers and they will be relying on everything working in the same way. When there are changes there must be clear and unambiguous reasons for why.

But I think we need to start building in more support for both the opt-in and opt-out technologies to make it easier to help migrate people away from what needs to go away and toward what we are doing now. Because the platform is only going to get more complex, and to fulfull the contract of the preceeding paragraph we must give those developers the tools they need.

Ok, that is a lot to chew on. I'd wager that the only way someone made it to this paragraph would be if they had skipped to the end, but just in case you have a thought on this, feel free to leave a comment either to Larry's post or these blatherings....


# LarryOsterman on Wednesday, May 11, 2005 11:52 PM:

Actually, IMHO, there is ONE (and only one) time when it's possible to end-of-life an API set.

A number of people have hinted at it in the comments on my blog (which have been extraordinary, btw - some really good ideas have been tossed out), but nobody has yet hit on what I consider the one key element.

And I'll talk about the time (recently) when we tried to end-of-life an API and failed miserably.

# Michael S. Kaplan on Wednesday, May 11, 2005 11:56 PM:

I tend to agree -- its why I am aiming to how best to morph functionality using opt-in or opt-out methods, depending on which is more appropriate)....

go to newer or older post, or back to index or month or day