Encoding APIs and Security Concerns, APIs and Security Decisions

by Michael S. Kaplan, published on 2005/04/20 11:22 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/04/20/410043.aspx

The other day I talked about issues with the two big encoding APIs in Win32, in posts A few of the gotchas of WideCharToMultiByte and A few of the gotchas of MultiByteToWideChar.

Both API topics (MultiByteToWideChar and WideCharToMultiByte) include "Security Alert" warnings to which developers should pay heed:

Security Alert Using the WideCharToMultiByte function incorrectly can compromise the security of your application. Calling the WideCharToMultiByte function can easily cause a buffer overrun because the size of the In buffer equals the number of WCHARs in the string, while the size of the Out buffer equals the number of bytes. To avoid a buffer overrun, be sure to specify a buffer size appropriate for the data type the buffer receives. For more information, see Security Considerations: International Features.

Security Alert Using the MultiByteToWideChar function incorrectly can compromise the security of your application. Calling the MultiByteToWideChar function can easily cause a buffer overrun because the size of the In buffer equals the number of bytes in the string, while the size of the Out buffer equals the number of WCHARS. To avoid a buffer overrun, be sure to specify a buffer size appropriate for the data type the buffer receives. For more information, see Security Considerations: International Features.

It seems obvious from the parameter names (cbMultiByte vs. cchWideChar) that one parameter is a count of bytes and the other is a count of characters. But if you mess it up then it is easy to cause a buffer overrun as the alerts indicate, which is obviously bad. Another problem that can happen is that the call could fail due to insufficient buffer size.

Now historically people seem to like calling these APIs without checking the return value. Or alternately they call with a NULL target buffer to get the size, allocate a target buffer based on that size, and then call the API again without checking the return value. Obviously both of the possible problems can be bad, especially if there is a logic error in the code you use to figure out the buffer allocation (which is usually byte based) and the parameter you then need to pass. You could even allocate buffers twice as large as you need to, which is not a security issue but is at the very least bad practice.

Since unlike most NLS APIs the target buffer is used even if the API ultimately fails, you may not even notice a problem when you are testing your application -- everything seems like it just works.

There is another issue which can also lead to security problems. That has to do with whether you either explicitly pass the size of the source buffer or pass -1 to mean that it is null-terminated string.

As the docs indicate:

If this parameter is -1, the function processes the entire input string including the null terminator. The resulting wide character string therefore has a null terminator, and the returned length includes the null terminator.
If this parameter is a positive integer, the function processes exactly the specified number of bytes. If the given length does not include a null terminator then the resulting wide character string will not be null terminated, and the returned length does not include a null terminator.

Passing -1 if the source buffer is NULL-terminated is the easiest way to avoid problems. But if may not be NULL-terminated, or you may be trying to convert a substring where the substring may not be NULL terminated even if the full string is. It is crucial to make sure you add the terminating NULL in such cases if you are going to use the resulting string. Or you might be relying on it running through random memory until a NULL is found.

This is a point that may deserve its own security warning, and one that applies to most of the other NLS APIs, too. The semantics of the "-1 for null terminated or the explicit length" source buffer size are very convenient, but a real death trap if used incorrectly, so much so that I would always recommend the following approach:

Use -1 when you can, so the NULL will exist in the result;
Allocate with enough space for the desired target plus room for the NULL, and then make sure to put that NULL in there.

If you follow those rules consistently for the NLS APIs, you will minimize the chances of subtle problems in your code.

There is a general inconsistency with Win32 APIs about whether returned size parameters include the NULL or not. For the NLS APIs they always do, for USER APIs like GetWindowText they do when you pass NULL to get the size, but they do not when they return the actual desired text. Though on the plus side most of the USER APIs guarantee NULL termination, which the NLS APIs do not. So their subtle bugs (or not so suble bugs if you are looking at the text of a window) are more likely to be one-letter string truncations than actual security concerns like ours.

In general you need to look at the documentation for the API to know the exact behavior, if you do not know what it is. This is a time when the point I made in API Consistency and Developer Comfort can really come in handy -- if you know in general how a family of APIs work, you can use that knowledge to make good decisions about calling APIs in the family.

This post brought to you by "ฒ" (U+0e12, a.k.a. THAI CHARACTER THO PHUTHAO)

# PatriotB on 20 Apr 2005 10:48 AM:

Good posts. Several years ago I wrote wrapper classes (CTSTR and CBSTR) to encapsulate all the string manipulation that I do, and deep in there are calls to these two functions. Now would be a great time for me to go back and review them and hopefully see that I'm using them correctly... :)

# Jeff Parker on 20 Apr 2005 12:08 PM:

You know I do not understand why more Microsoft Documentation doesn't just come out and tell us this when you look up the API. Same in .net the information there shows us how to use it, but it never discuses the proper way or the security risks with using something a certain way.

For example the memory leak recently discovered in the Enterprise Library that they patched. I didn't even know a memory leak was possible with .net. Once I did a whole lot of digging around I discovered that indeed I had the same memory leak in one of my apps. But you go look at the documentation in msdn on it and no where does it mention using that part of the framework in a specific way could result in a leak.

Memory Leak info.
http://www.gotdotnet.com/workspaces/news/newsitem.aspx?id=83c68646-befb-4586-ba9f-fdf1301902f5&newsId=87500cd4-1b0a-4fbf-878f-b5f968f50e46

But thats why we read the blogs for good article like this to make us aware of these problems before they happen.

# Michael S. Kaplan on 20 Apr 2005 2:42 PM:

In fairness to the docs, all of the issues I mention are documented in the individual functions, even the different USER and NLS functions. What is missing in some cases is the over-arching text that compares and contrasts them, or adds the security context to what is being documented.

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

referenced by

2007/09/23 If it isn't Unicode, it isn't ANY code!

2006/04/11 I daresay it is often <= -1

go to newer or older post, or back to index or month or day