It is okay to not be in favor of termination

by Michael S. Kaplan, published on 2008/05/09 14:36 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2008/05/09/8481353.aspx


No, this is not a post about abortion. I am talking about NULL termination, a way to end strings that is legal in all fifty states and that is widely used by software that is in regular use by people all over the world irregardless of their views about the other kind of termination. So it is a universally safe topic, even if this disclaimeresque introduction isn't, necessarily.

There are many patterns to calling functions within the NLS API, most (but not all) of which have the following rules:

This behavior can easily get confusing.

It is even a security topic since if misused the filled buffer without a terminating NULL can lead to real bugs if not used properly.

But there are times when it is exactly what you want.

Say for example if you are using LCMapString to do case conversion and you are doing it inline. Inserting a random NULL in the middle of a string with existing content that one wishes to preserve is seldom a good idea.

Other calls can also require this behavior.

But it does explain the following calls and results:

LCMapString(LOCALE_USER_DEFAULT, LCMAP_FULLWIDTH, L"\u00c4\u0170", -1, NULL, 0)
return value: 3

LCMapString(LOCALE_USER_DEFAULT, LCMAP_FULLWIDTH, L"\u00c4\u0170", -1, wz, 3)
return value: 3
wz value: L"\u00c4\u0170\u0000"

LCMapString(LOCALE_USER_DEFAULT, LCMAP_FULLWIDTH, L"\u00c4\u0170", -1, wz, 10)
return value: 3
wz value: L"\u00c4\u0170\u0000"

LCMapString(LOCALE_USER_DEFAULT, LCMAP_FULLWIDTH, L"\u00c4\u0170", 2, wz, 3)
return value: 2
wz value: L"\u00c4\u0170"

LCMapString(LOCALE_USER_DEFAULT, LCMAP_FULLWIDTH, L"\u00c4\u0170", 2, wz, 2)
return value: 2
wz value: L"\u00c4\u0170"

(It helps to know that L"\u00c4\u0170" will have an implicit NULL at the end when one is looking at it from inside the function!)

Now in this case the two characters:

Ä (U+00c4, aka LATIN CAPITAL LETTER A WITH DIAERESIS)

Ű (U+0170, aka LATIN CAPITAL LETTER U WITH DOUBLE ACUTE)

have no full-width equivalents and thus will pass through the function with this flag, completely unchanged. Which provides an ideal opportunity to see all of this behavior, under the microscope so to speak....

 

This blog sponsored by those two characters along for the ride, above


# mirabilos on 8 Jun 2008 10:47 AM:

It's “NUL” termination,

not “NULL” termination.

NUL = ASCII byte 0x00

NULL = pointer value (void *)0L

By the way, am I the only

German around in favour of

the uppercase eszett?


Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

go to newer or older post, or back to index or month or day