by Michael S. Kaplan, published on 2005/03/06 12:38 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/03/06/386194.aspx
A little while back, I was having a conversation with our fearless group's Director, Julie Bennett. Actually, I think maybe she was a Group Manager then, but it is neither here nor there. :-)
I was talking to her because there was a customer who did not like LCMapString's behavior with sort keys.
The API is designed such that if you pass to the API a string and the LCMAP_SORTKEY flag, a binary representation of the string (suitable for use as an index) will be returned.
One of the design principles the API works by when creating sort keys is that it does not touch the target buffer unless
Unlike even other NLS APIs (e.g. WideCharToMultiByte and MultiByteToWideChar), the target buffer is not used as a scratch pad containing the interim results if the API fails during processing so that if there is an error or the size is insufficient it will be left in an unknown state. It is only touched if the end result is "customer ready" and usable. This behavior is true of many other NLS APIs, like GetLocaleInfo and GetNumberFormat, and GetDateFormat. Only some of them have the issue with allocation, but most have the behavior of not prematurely touching the target buffer.
The customer was not really happy about that behavior since they wanted the target buffer to be used as a scratch pad. They has looked at the disassembly and seen the HeapAlloc calls that were made for larger strings and they were worried about those allocations and wanted them to never happen, for performance reasons. So why couldn't the API just use the target buffer and be done with it?
So I was looking into whether that was feasible. The request made sense and I did not have a specific reason to not consider changing it at the time. So why not do a little investigation, right?
I was very surprised at Julie's reaction.
Keep in mind that she had not been doing the development work in this area for years. At this point she was managing the people who were managing the people who managed the people who did this development work. To have this conversation I had to meet with her in the evening after a day full of meetings that included an architect from the Shell team and a VP of something or other. I had seen the movie Head Office where the chairman of the company liked to review individual customers' delinquent phone bills to feel like he was staying connected to the customer, but she is (thankfully!) not that kind of a micro-manager.
I had been getting a bunch of reactions from various people that were mostly the same as my initial gut reaction (a vague sense of unease that it might be a bad idea but no convincing argument against making what seemed like a sensible change to consider). And I thought that as the original author former development owner of these APIs, she might have an opinion.
Boy did she!
She pointed out that the vague sense of unease was probably due to the implicit behavior change that would occur. When you deal with a Windows API there may be hundreds of thousands or more callers with millions of individual calls. One could never predict how one might use an API, and behavior (whether documented or undocumented) was something that would cause users to be broken any time you change it. Now that did not make change impossible, but it does minimize change, by necessity. Even when people make calls incorrectly, you have to carefully weigh the benefit of any change against both foreseeable/reasonable and unforeseeable/unreason usage.
For example, change to this particular API that must happen includes updates for new languages and occasional bug fixes for old ones. Thankfully the latter is rare and hopefully the former is often. But in any case in order to not break clients who have to depend on sort keys, the collation version APIs (IsNLSDefinedString and GetNLSVersion) were added in Windows Server 2003. In that way any client can use a mechanism (like the one I suggested in What makes a string meaningful?) to allow updates while still working properly. So even when change is deemed worthy a lot of trouble is taken to architect when and how that change is made.
She said in the end there is no way to know what people are doing with our APIs; we only ever get to contact a small minority of the people who work with them. As a a matter of practical necessity the people we do talk to are not a representative sample since the type of personality that makes someone seek out public resources for APIs like conferences and newsgroups is not the same as the type that does not. If we change behavior than there is no way to know the full extent of what (or who) we are breaking, but it is a scientific certainty that someone will be broken.
I left that meeting with a few thoughts on my mind:
So how could changes like this one be productively made? Well, there are several different options. Two example options:
The same rules exist for the .NET Framework, which takes the idea of the problem with breaking changes quite seriously and works with customers to make sure that changes are not made that would break programs.
Armed with this knowledge, a plan can be formulated on what to do. And how best to do it if change must be made.
You may be wondering what will happen for this specific issue, and when? Too early to say, but you know me -- when I can say, I will.
But in the meantime, the next generation of owners of NLS APIs have properly learned the lessons that the previous already knew . And by this, the future of these APIs has been assured. :-)
This post brought to you by "8" and "∞" (U+0038 and U+221e, a.k.a. DIGIT EIGHT and INFINITY)
Two characters who are great friends, merely a single best fit mapping away from each other!
# AC on 6 Mar 2005 8:14 PM:
# Michael Kaplan on 6 Mar 2005 10:23 PM:
referenced by