Previous posts in this series (including today's!):
- Part 0 (The introduction)
- Part 1 (Business before pleasure)
- Part 2 ('Sorry, you're not my type.' 'Um, maybe I could change that?)
- Part 3 (Can I quote you on that?)
- Part 4 (/Delightful, /Delicious, /DUnicode!)
- Part 5 (Are we there yet? Well, not just yet)
- Part 6 (Upon the road not traveled)
- Part 7 (What does it mean to fit things to a 'T', anyway?)
- Part 8 (Fitting MSLU into the mix)
- Part 9 (The project's postpartum postmortem)
(If you are just tuning in and want to start now that we are done, you can grab the latest source from here)
If you look at the source, you'll see I chickened out of always adding MSLU to Unicode builds, so there is makefile.mslu and a makefile.uni. :-)
Now that we have gone through and taken an application that is actually useful and converted it to Unicode, I figured for the review it would be good to talk about it a bit.
(I honestly did not look at the code until after deciding to do the series, so this is a true postmortem decision about the effort!)
As projects go, this one was fairly tame, and although there were a few issues that were discussed, it was just a few. Tto compare briefly, the kbdtool.exe --> kbdutool.exe conversion I mentioned back in Part 0 made extensive use of the C Runtime for its extensive file handling and parsing and creating operations. So the single example of strtoul being converted to _tcstoul I taked about in Part 4 would have to be multiplied to the 131 such changes that were required. So the fact is that in the real world of app conversion you could find that the actual effort takes more time even if you do not run into any problems more complex than we dealt with here.
Another interesting comment that was made by Mike Dimmick to Part 3 talked about an issue related to prinft-esque format specifiers, which have outrageous rules in relation to Unicode conversion:
Character | Type | Output format |
---|---|---|
c | int or wint_t | When used with printf functions, specifies a single-byte character; when used with wprintf functions, specifies a wide character. |
C | int or wint_t | When used with printf functions, specifies a wide character; when used with wprintf functions, specifies a single-byte character. |
hc, hC | int or wint_t | Specifies a single-byte character; it is always interpreted as type CHAR, even when the calling application uses the #define UNICODE compile flag. |
hs, hS | String | Specifies a string; it is always interpreted as type LPSTR, even when the calling application uses the #define UNICODE compile flag. |
lc, lC | int or wint_t | Specifies a wide character; it is always interpreted as type WCHAR, even when the calling application does not use the #define UNICODE compile flag. |
ls, lS | String | Specifies a string; it is always interpreted as type LPWSTR, even when the calling application does not use the #define UNICODE compile flag. |
s | String | When used with printf functions, specifies a single-byte–character string; when used with wprintf functions, specifies a wide-character string. Characters are printed up to the first null character or until the precision value is reached. |
S | String | When used with printf functions, specifies a wide-character string; when used with wprintf functions, specifies a single-byte–character |
Now I can completely understand why every single one of these format specifiers exist, but you can see why there is a potential for strange results as one moves a project to Unicode, since one is not only dealing with the conversion of the application but in some cases one is dealing with parsing and manipulating data from other sources that may or may not also be converted at the same time.
In our case, the extensive use of formatting strings in the DebugMsg function was alwaysd used by callers with the %s type, so everything worked out. But if you are converting an application that is using anothing other than %c and %s from the above table, one can have a much harder job to decide how to convert the project.
Clearly the project was in many ways written in "the right way" to handle the conversion we did -- note especially the mostly consistent use of sizeof() in character buffer lengths, something often missing -- a fact that only came to bite us in a few specific cases that were clearly written later on by other developers.
Because of such efforts, it is perhaps better to think of the setup bootstrap EXE project as a fair representative of the type of problems one will hit, if not necessarily the magnitude of those problems.
And what has been "delivered" is an EXE that you may well see in the upcoming release of MSKLC. :-)
Now I'll keep my eyes open, and if I run across another example like this of a project to convert that can be shared this way I'd love to do it again some time. I think it would be especially interesting to do one that turns out to be much harder in terms of the amount of effort, just to help give a good sense of how hard people might find the process, in general.
This post brought to you by ᠹ (U+1839, a.k.a. MONGOLIAN LETTER FA)