by Michael S. Kaplan, published on 2005/04/20 11:22 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/04/20/410043.aspx
The other day I talked about issues with the two big encoding APIs in Win32, in posts A few of the gotchas of WideCharToMultiByte and A few of the gotchas of MultiByteToWideChar.
Both API topics (MultiByteToWideChar and WideCharToMultiByte) include "Security Alert" warnings to which developers should pay heed:
Security Alert Using the WideCharToMultiByte function incorrectly can compromise the security of your application. Calling the WideCharToMultiByte function can easily cause a buffer overrun because the size of the In buffer equals the number of WCHARs in the string, while the size of the Out buffer equals the number of bytes. To avoid a buffer overrun, be sure to specify a buffer size appropriate for the data type the buffer receives. For more information, see Security Considerations: International Features.
Security Alert Using the MultiByteToWideChar function incorrectly can compromise the security of your application. Calling the MultiByteToWideChar function can easily cause a buffer overrun because the size of the In buffer equals the number of bytes in the string, while the size of the Out buffer equals the number of WCHARS. To avoid a buffer overrun, be sure to specify a buffer size appropriate for the data type the buffer receives. For more information, see Security Considerations: International Features.
It seems obvious from the parameter names (cbMultiByte vs. cchWideChar) that one parameter is a count of bytes and the other is a count of characters. But if you mess it up then it is easy to cause a buffer overrun as the alerts indicate, which is obviously bad. Another problem that can happen is that the call could fail due to insufficient buffer size.
Now historically people seem to like calling these APIs without checking the return value. Or alternately they call with a NULL target buffer to get the size, allocate a target buffer based on that size, and then call the API again without checking the return value. Obviously both of the possible problems can be bad, especially if there is a logic error in the code you use to figure out the buffer allocation (which is usually byte based) and the parameter you then need to pass. You could even allocate buffers twice as large as you need to, which is not a security issue but is at the very least bad practice.
Since unlike most NLS APIs the target buffer is used even if the API ultimately fails, you may not even notice a problem when you are testing your application -- everything seems like it just works.
There is another issue which can also lead to security problems. That has to do with whether you either explicitly pass the size of the source buffer or pass -1 to mean that it is null-terminated string.
As the docs indicate:
If this parameter is -1, the function processes the entire input string including the null terminator. The resulting wide character string therefore has a null terminator, and the returned length includes the null terminator.
If this parameter is a positive integer, the function processes exactly the specified number of bytes. If the given length does not include a null terminator then the resulting wide character string will not be null terminated, and the returned length does not include a null terminator.
Passing -1 if the source buffer is NULL-terminated is the easiest way to avoid problems. But if may not be NULL-terminated, or you may be trying to convert a substring where the substring may not be NULL terminated even if the full string is. It is crucial to make sure you add the terminating NULL in such cases if you are going to use the resulting string. Or you might be relying on it running through random memory until a NULL is found.
This is a point that may deserve its own security warning, and one that applies to most of the other NLS APIs, too. The semantics of the "-1 for null terminated or the explicit length" source buffer size are very convenient, but a real death trap if used incorrectly, so much so that I would always recommend the following approach:
If you follow those rules consistently for the NLS APIs, you will minimize the chances of subtle problems in your code.
There is a general inconsistency with Win32 APIs about whether returned size parameters include the NULL or not. For the NLS APIs they always do, for USER APIs like GetWindowText they do when you pass NULL to get the size, but they do not when they return the actual desired text. Though on the plus side most of the USER APIs guarantee NULL termination, which the NLS APIs do not. So their subtle bugs (or not so suble bugs if you are looking at the text of a window) are more likely to be one-letter string truncations than actual security concerns like ours.
In general you need to look at the documentation for the API to know the exact behavior, if you do not know what it is. This is a time when the point I made in API Consistency and Developer Comfort can really come in handy -- if you know in general how a family of APIs work, you can use that knowledge to make good decisions about calling APIs in the family.
This post brought to you by "ฒ" (U+0e12, a.k.a. THAI CHARACTER THO PHUTHAO)
# PatriotB on 20 Apr 2005 10:48 AM:
# Jeff Parker on 20 Apr 2005 12:08 PM:
# Michael S. Kaplan on 20 Apr 2005 2:42 PM:
referenced by
2007/09/23 If it isn't Unicode, it isn't ANY code!
2006/04/11 I daresay it is often <= -1