Converting a Project to Unicode: Part 6 (Upon the road not traveled)

by Michael S. Kaplan, published on 2007/01/02 06:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2007/01/02/1395703.aspx

(If you are just tuning in and want to start now you can grab the current source from here -- no changes since it was posted yesterday)

Now if you have read Parts 2, 3, 4, and 5 then you know how we went from a purely ANSI application to a purely Unicode one.

The binary itself has been tested with the MSKLC update and it resolves the bug I talked about back in Part 0. And the Unicode Bootstrap EXE works for the scenarios in which it will be used.

(which, as the lessons of Part 5 hopefully taught everyone, means that there could still be bugs in the other scenarios like internet downloads -- these will have to be tested by somebody at some point!).

But perhaps people who have done this type of thing before felt uncomfortable with the route I took -- all of those global changes in parts 2 and 3 might seem quite different from the way of just compiling with UNICODE and _UNICODE and fixing errors as they come. Certainly the experience I set people up for earlier in Part 1 was a lot uglier than what actually happened. So why would I have done it that way, and what is the experience like if it is not done that way?

Well, you start with a lot more errors, obviously. And due to the many dependencies in the files (like the header files, and all the functions in util.cpp used throughout the code), you can easily find yourself revisiting the same files over and over again as you compile all and continually break files that you just fixed.

As to why I prepared for much more dire experiences, the Bootstrap EXE sample project was as pretty tame one, with a reasonably small number of changes to make beyond datatypes. Some cases are not quite as clean as that and can have many more -- some project you my apply the same plan to could be a lot more brutal in terms of number of errors...

I really prefer not to take the harder route though, since you can easily miss cases -- for example think of all the times that you have to catch sizeof(char) or sizeof(CHAR) and change it to sizeof(TCHAR). All you have to do is miss one and you'll hit bugs like the one in Part 5 caused by your Unicode migration rather than by pre-existing bugs. Because bugs like that are not found at compile time, so you have to pay the price later in terms of bugs or problems you catch in unit testing. And in the rush to make changes, compile, make more changes, compile again, and so on, it is easier to miss things.

Like is just a lot easier if the global changes can be made upfront so you can focus on the special cases....

Of course you are welcome to try it if you like -- just do Part 4 after skipping parts 2 and 3....

Tomorrow, Part 7 will be going up to do more than just jabber about stuff like this post did!

>>As to why I prepared for much more dire experiences, the Bootstrap EXE sample project was as pretty tame one, with a reasonably small number of changes to make beyond datatypes. Some cases are not quite as clean as that and can have many more -- some project you my apply the same plan to could be a lot more brutal in terms of number of errors...<<

I just did this, I was converting OGRE ( http://www.ogre3d.org ) to TCHAR (well, half of it so far). One problem I ran into is that, several times, they used char* as a byte buffer. Wholesale char to OgreChar (or TCHAR) was not a good idea.

Writing a VS macro to properly encase all strings with "OgreText()" (or "_TEXT()") was a good idea though.

I actually used match whole word only and match case.

The problem is when char* is used as a byte buffer, for example when writing *data* to and from a file, or to a memory stream (like D3D's ID3D10Buffer). In this case it's required to be a certain number of bytes, so I had to make a byte typedef.

Also Ogre uses "GetProcAddress(module, SymbolName.c_str())". When SymbolName is a std::wstring, I needed to make a change (in this case, WideCharToMultiByte it into a UTF8 string (probably not a good idea, but when I export a function called 殺すfrom VS and use GetProcAddress on the UTF8 string, it works) and then pass to GetProcAddress).

Indeed. This was before you started your series. Your advice might have changed things. :)

Like I said, wholesale replace of "char" with "OgreChar" was probably not the best of ideas.

Building a VS IDE Macro that fixes all quoted strings that aren't after a #include, nor already fixed, was a good idea.

Although I might want to modify it to also ignore Assert(i == 0 && "The assertion failed for such and such a reason") because the message "Assertion "i ==0 && _TEXT("The assertion failed for such and such a reason")" failed" is a bit... odd.

But OGRE is a case where reviewing takes a large amount of time because there is a _lot_ of chars that should have been OgreChars, and far fewer that should have been bytes.