What's wrong with one of GetNumberFormat's callers? And what's wrong with GetNumberFormat?

by Michael S. Kaplan, published on 2008/03/10 09:16 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2008/03/10/8120567.aspx

The other day he sent me mail about a bug (actually a small bundle of bugs, but the bug he found was a crash bug) in a third party application (name withheld to protect the guilty, and also the embarrassed!).

Igor looked at crash via the disassembly, which I'll put here just for the sake of completeness. If you are the same kind of person you can work along here and try to find the problems within the disassembly:

From assembly it is pretty hard to know whose application it is, but what is happening in this one function is not too hard to figure out.

Now for those who really aren't as comfortable working this way where you have watch parameters get placed on the stack an such (though it should be fairly straightforward to work with in this case for those so inclined!), here is some essentially equivalent C code:

What it is trying to do, with many of the bugs embedded in the descriptive language for easy retrieval:

Originally Igor was thinking this might have even made a great entry for The Daily WTF, and I'm not gonna disagree with him on that.

The crash bug Igor had run into initially was based on the fact that he had Serbian locale settings with a customized decimal separator -- thus the user override mismatch in #1 and #2 above quickly led to a problem searching for a "." in a string such as "44,90" -- then after properly detecting that wcsrchr returning NULL for failure, a somewhat catastrophic situation arises when it tries to dereference that NULL in order to assign to it.

Basically what they (apparently) wanted to do was format a number with the user's preferences but overriding the user's choice of the number of decimal places. The resulting string is put in their user interface, returning the result to the user in a property sheet.

Worse ways of achieving that goal have been reported in code reviews, but not by reliable witnesses.

Now in fairness to the people who wrote this code (whoever they are and whenever they wrote it), GetNumberFormat has some specific limitations that make it less useful and that make it more difficult to write the idealized version of this function.

I am going to enumerate the three big problems here as I see them, and Igor might have some additional thoughts on this matter either for comments here or on his own site. :-)

But the NLS team could think of this next bit as feedback to them on ways to make functions like GetNumberFormat better, faster, easier to use, and more generally useful.

First of all, since GetNumberFormat can only take a string rather than a number due to the lack of a LOCALE_SPECIFY_NUMBER flag (as I mentioned in Pass the string please, three years ago), the caller must format their number as a string via a function like _swprintf so that it can then format the number within the string as yet another a string -- and perhaps this is proof that I have changed my mind a bit since tha tblog; the NLS code really ought to help out more with the more common obvious cases when it can, such as this one.

Second of all (and perhaps most importantly), the longstanding behavior of GetNumberFormat that requires either a fully filled in NUMBERFMT in lpFormat or a NULL lpFormat means that the only way to specify an lpFormat->NumDigits value is to also fill in the lpFormat->LeadingZero, lpFormat->Grouping, lpFormat->lpDecimalSep, lpFormat->ThousandSep, and lpFormat->NegativeOrder values as well -- thus requiring up to five calls to GetLocaleInfo for information entirely derivable from the passed-in locale (assuming one value being overridden).

There are any number of alternate possible ways it could have been done -- like establish the NULL case for the struct as being the default which means "take the locale's data", or specify values that mean the same thing, or even define flags to specify which values to pay attention to in the structure (there is room in the dwFlags for this both this function and GetCurrencyFormat!).

Third of all, where GetNumberFormat always requires either a fully filled in NUMBERFMT in lpFormat or a NULL lpFormat, if you don't pass in that filled-in NUMBERFMT then the code behind the function always grabs all of the information even if it does not need it -- meaning for example that it will grab the user overridable lpFormat->ThousandSep even if the number is not big enough to need it according to the number itself. The upshot of this is that the performance of the function is much better if you pass in the lpFormat yourself if you have the data handy -- because the function is not smart about how it does its work.

Now one may argue against that type of optimization due to potential thread safety issues caused by SetLocaleInfo calls happening while number formatting is happening, but given that this is already really a bit of a problem in the code terms of consistent number formatting in such situations (and also how uncommon SetLocaleInfo calls or user-specific Regional Options changes that amount to SetLocaleInfo calls actually are), there are much better solutions for this problem that are possible. Much moreso than the current solution (trying to front-load all of the data loading calls to minimize the number of possible SetLocaleInfo calls that could happen). And many of those better solutions would be more performant, too!

On top of all that, this front-loading in the case of the NULL lpFormat happens when LOCAL:E_NOUSEROVERRIDE is specified, too -- meaning there is no thread safety issue all but the code is happy to fill a structure with six separate data items without ever returning the data it loaded to the caller (in case they were going to call the function repeatedly). If one has to pay the price, one would like a bit more for one's money, in my opinion....

If those issues did not exist, then all of the work done by the third party GetDoubleFormat could happen in a single very fast function call, rather than a whole bunch of code, such as the buggy code Igor pointed out in the version of the third party application he was looking at.

Now none of this excuses the bugs in the third party application -- that code screws up the usage of NLS functions with multiple bugs including some that crash? They have no good reason to have those.

But if the NLS function did a bit more of the work here, perhaps the kind of developers who were going to make those mistakes when left to their own devices would have one less opportunity to do so?

"GetNumberFormatW assumes that space for the NULL is included in the buffer -- why allocate a wchar_t array of 256 if you are only going to pass 255 for the length?"

Because MSDN is almost always vague about what the length arguments mean. MSDN typically talks about the length of the string, but doesn't say if it means the length as in what strlen (etc.) would return, or if it means the number of characters in the string including the null.

I expect most APIs do include the null in their lengths but it only takes one bad API call to cause a crash or corrupt memory when the string happens to be the same length as the buffer and the null either doesn't get written or gets written off the end of the buffer...

So I almost always, unless the MSDN documentation is very explicit, allocate that one extra character. It's just two bytes of memory, so why not? Not everyone has the advantage of being able to look in the API source when the documentation falls short and it's quicker to waste 2 bytes (for the millisecond that stack frame is in scope) than to run tests on every API to see how it behaves when the result fills the buffer exactly.

This is how I would write GetDoubleFormat():

#ifndef _WIN32_WINNT
#define _WIN32_WINNT 0x0501
#endif

#include <tchar.h>
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <float.h>
#include <windows.h>

#define MSEP_MAX 5    // 4 characters + NULL, actual Windows XP limit
#define TSEP_MAX 4    // 3 characters + NULL, actual Windows XP limit
#define DSEP_MAX 4    // 3 characters + NULL, actual Windows XP limit
#define NDEC_MAX 9    // 9 decimal places, actual Windows XP limit
#define PDBL_MAX 320 // 1.7976931348623158e+308
#define PDBL_MAX2 640
// Copyright (c) 2008 by Igor Levicki. All Bugs Reserved. :-)

LPCTSTR GetDoubleFormat(const double Value, const UINT DecimalPlaces) {
    static TCHAR Formatted[MAX_PATH];
    TCHAR StrValue[MAX_PATH];
    TCHAR StrFraction[MAX_PATH];
    TCHAR MSep[MSEP_MAX] = _T("-");
    TCHAR TSep[TSEP_MAX] = _T(",");
    TCHAR DSep[DSEP_MAX] = _T(".");

    const TCHAR Dot = _T('.');
    const TCHAR Zero = _T('0');

    LPTSTR FracPtr;
    size_t Cch, Mod, Idx;

    if (DecimalPlaces > NDEC_MAX) {
        SetLastError(ERROR_INVALID_PARAMETER);
        return NULL;
    }

    if (Value > DBL_MAX) {
        SetLastError(ERROR_INVALID_PARAMETER);
        return NULL;
    }

    GetLocaleInfo(LOCALE_USER_DEFAULT, LOCALE_SNEGATIVESIGN, MSep, MSEP_MAX);
    GetLocaleInfo(LOCALE_USER_DEFAULT, LOCALE_STHOUSAND, TSep, TSEP_MAX);
    GetLocaleInfo(LOCALE_USER_DEFAULT, LOCALE_SDECIMAL, DSep, DSEP_MAX);

    Formatted[0] = 0;

    switch (_fpclass(Value)) {
        case _FPCLASS_SNAN:
        case _FPCLASS_QNAN:
        case _FPCLASS_NINF:
        case _FPCLASS_ND;
        case _FPCLASS_PD:
        case _FPCLASS_PINF:
            SetLastError(ERROR_INVALID_PARAMETER); // or ERROR_INVALID_DATA
            return NULL; // or ERROR_NOT_SUPPORTED

        case _FPCLASS_NZ:
            _tcscat_s(Formatted, MAX_PATH, MSep);

        case _FPCLASS_PZ:
            _tcsncat_s(Formatted, MAX_PATH, &Zero, 1);
            if (DecimalPlaces > 0) {
                _tcscat_s(Formatted, MAX_PATH, DSep);
                size_t Len = _tcslen(Formatted);
                for (UINT i = 0; i < DecimalPlaces; i++) {
                    Formatted[Len + i] = Zero;
                }
                Formatted[Len + DecimalPlaces] = 0;
            }
            return Formatted;
        }

        _stprintf_s(StrValue, PDBL_MAX, _T("%0.*f"), DecimalPlaces, abs(Value));
        FracPtr = _tcsrchr(StrValue, Dot);

        if (FracPtr != NULL) {
            *FracPtr++ = 0;
            _tcscpy_s(StrFraction, PDBL_MAX, FracPtr);
        } else {
            StrFraction[0] = 0;
        }

        Cch = _tcslen(StrValue);
        Mod = (Cch % 3);
        Idx = (Mod == 0) ? 3 : Mod;
        Formatted[0] = 0;

        if (Value < 0.0) {
            _tcscat_s(Formatted, PDBL_MAX2, MSep);
        }

        for (size_t i = 0; i < Cch; i++) {
            if (i == Idx) {
                _tcscat_s(Formatted, PDBL_MAX2, TSep);
                Idx += 3;
            }

            _tcsncat_s(Formatted, MAX_PATH, &StrValue[i], 1);
        }

    if (DecimalPlaces > 0) {
        _tcscat_s(Formatted, PDBL_MAX2, DSep);
        _tcscat_s(Formatted, PDBL_MAX2, StrFraction);
    }

    return Formatted;
}

int _tmain(int argc, _TCHAR* argv[]) {
    if (--argc != 2) {
        printf("Usage : %s <decimal_number> <places>\n", argv[0]);
        return 1;
    }

    double Value = _tstof(argv[1]);
    int DecimalPlaces = _tstoi(argv[2]);

    _tprintf(_T("DecimalPlaces = %ld, Formatted : %s"), DecimalPlaces, GetDoubleFormat(Value, DecimalPlaces));

    return 0;
}

It properly handles user overrides for minus sign (up to 4 characters), thousand separator and decimal separator (up to 3 characters each), and it allows showing more decimal places than the original which could only chop the fractional part, not extend it.

Function could be easily expanded to handle +/- Infinity (and use locale supplied strings for that), to handle denormals and to display SNaN and QNaN text if needed. A bit more challenging task would be to handle digit replacement correctly and to use digit grouping other than 3.