Adding the requisite number of mistakes to the code

by Michael S. Kaplan, published on 2010/02/05 07:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2010/02/05/9958665.aspx


The other day Eric asked:

I tried to change this API to CharToOemBuff, however that API is also banned.  The suggested replacement on MSDN (http://msdn.microsoft.com/en-us/library/bb288454.aspx) is to use WideCharToMultibyte.  CharToOem is ansi to ansi, however WideCharToMultibyte/MultiByteToWideChar is Unicode/ansi.  Should I be upconverting to wide char and then converting back to ansi to accomplish this?  Is there another API that might make more sense?

Some of the confusion here is admittedly in that table in Michael Howard's Security Development Lifecycle (SDL) Banned Function Calls:

Table 17. Banned OEM conversion functions and replacements

Banned APIs Windows Replacement
CharToOem, CharToOemA, CharToOemW, OemToChar, OemToCharA, OemToCharW, CharToOemBuffA, CharToOemBuffW WideCharToMultiByte

Now of course at the top there is some help here:

Also note that some of the function names might be a little different, depending on whether the function takes ASCII, Unicode, _T (ASCII or Unicode), or multibyte chars. Some function names might include A or W at the end of the name. For example, the StrSafe StringCbCatEx function is also available as StringCbCatExW (Unicode) and StringCbCatExA (ASCII).

Now from this one can (with a bit of effort) derive:

Now Eric already noticed the issue with the functions to call and the fact that the suggested replacements weren't entirely complete as a list, but the functions are just bad news for a whole host of reasons, from a security perspective because of all of the buffer size assumptions that can cause problems if any mistakes are made.

There aren't better functions out there, and actually the above with is pretty much all of these functions do. With the following rules that you would have to follow as well:

  1. For the non-*Buff versions, assume the source buffer is NULL-terminated and the target buffer is big enough to hold the target plus the NULL;
  2. For the "A" non-*Buff versions, assume the buffer size refers to CHARs (bytes) on both sides, of equal size;
  3. For the "W" non-*Buff versions, assume the buffer size refers to CHARs (bytes) on the OEM side and WCHARs on the Char side, with equal counts of the two different sized parameters*;
  4. For the "W" non-*Buff versions on a CJK system locale, the byte count will need to be up to twice the size of the character count to allow for the maximum potential bytes vs. character difference in these code pages.

Now here is where Michael Howard and I tend to have a small difference of opinion.

Disclaimer: I have known Michael Howard for years and have a ton of respect for him and the work he does. I know he won't take any of this as a personal attack, but I don't want anyone reading this to doubt for a second that he can solve more computer security problems in his sleep then I can do with all of his books and hours to find/fix issues in a code base. Because to be perfectly honest, in all likelihood he probably can.

Look at those rules for a second.

I mean really look at them.

Now imagine you give all of the information in this blog you are reading to 10,000 competent developers with instructions to write up these functions.

Ok, have the scenario in mind?

Now answer the following questions honestly:

  1. What percentage of those developers will write the code correctly with no security errors?
  2. If you didn't give them the information in this blog you are reading (the functions to call and the rules), what percentage of those developers will be able to write the functions correctly without security errors?
  3. Do you believe that percentage in the above two questions will be higher or lower than the percentage who might make a mistake in the simplified wrappers provided in the Win32 API on this "banned" list if the calling rules were fully enumerated/explained?

The Win32 API functions are correct on the four rules; is the answer for this grouping of functions truly to say "do it yourself without enough information" if you want the behavior to be safer?

I don't think all of the functions on the full list would be this way, but this small grouping of functions is likely adding some bugs to code for anyone who is just following the rules about the "banned" list.

This I believe to be true even given allowing for the percentage of those developers who are able to spot the problems given all of the various buffer sizes and such right in front of them, not hidden by the wrappers (which is presumably one of the main reasons fir why these functions are not considered safe).

The real, best answer is to use Unicode instead of all of these code page functions, with no conversions needed since they are all extraneous.

The one time that some of these functions are theoretically useful is to use the "W" versions in your Unicode console applications, since the console itself is using the OEMCP. But since that value can be changed, the functions are of limited use and likely will lead to bugs anyway (not that the replacement code would be likely to do better unless someone really smart reading this blog were to grok the scope of the issue and come up with the improved versions....


# jmdesp on 6 Feb 2010 3:11 AM:

Well, even using Unicode only you can make mistakes, as I learned the hard way recently (fortunately the problem was obvious) because you can quite easily get a mix a C language operators that works on byte length and other that will work with WCHAR_T length. As your pointer will be on WCHAR_T and adding 1 will add WCHAR_T length, the better option is to carefully make sure to only use functions that work on WCHAR_T units. At the end the best conclusion is probably to drop C/C++ as much as you can. With genericity, lambda, the ngen native compilation option, c# is becoming a more and more porwerful language every day.

# Michael S. Kaplan on 6 Feb 2010 10:19 AM:

Does claiming to use Unicode but mixing it with non-Unicode functions count as just using Unicode? :-)

But it is a fair point; there are no answers that are *too* easy.

# Michael S. Kaplan on 6 Feb 2010 5:48 PM:

Now of course there is an actual solution here: WRITE the new, safe versions of the functionality, and mark them as the replacements. It seems no one wanted to do that, though. :-(

# yuhong2 on 10 Feb 2010 10:15 PM:

"For the "W" non-*Buff versions, assume the buffer size refers to CHARs (bytes) on the OEM side and WCHARs on the Char side, with equal counts of the two different sized parameters*;

For the "W" non-*Buff versions on a CJK system locale, the byte count will need to be up to twice the size of the character count to allow for the maximum potential bytes vs. character difference in these code pages."

Now, you mean the *Buff versions? Anyway, this could be a source of buffer-overflow bugs that show up on only CJK system locales! Another reason to use WideCharToMultibyte/MultiByteToWideChar, IMO, beside the fact that it works in Win9x unlike CharToOem*W and OemToChar*W.


Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

go to newer or older post, or back to index or month or day