Unicode INI function; Unicode INI file?

by Michael S. Kaplan, published on 2006/09/15 03:11 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/09/15/754992.aspx

Jake asked me via the Contacting Me... link:

I wanted to store my application settings in an .INI file so migration of settings would be easier than the registry allows. I have some Unicode strings to store, but I was not worried since I was using WritePrivateProfileStringW. But the file that is created wasn't Unicode!

The documentation claims it would make Unicode files, but it seems to be wrong.

So, what's the point of calling a Unicode function if it is not going to write a Unicode file?

Ah, this takes me back. It is an issue that I first noticed way back when I was writing the INI file function wrappers for MSLU.

Disclaimer: this entire post, as well as my original investigation, is based entirely on my own investigtions into the behavior of functions, and any discussion of motives is pure supposition on my part to try to explain the behavior.

It probably dates all the way back to when WritePrivateProfileStringW and WritePrivateProfileStringA were added to NT.

To be perfectly honest, sometimes INI files might be Unicode, and they didn't want to refuse to read them. One day that might be more common, after all.

But the developers probably figured that most INI files would not be Unicode, at least not any time soon.

Now obviously the parameter encoding is for 99.9% of the population bound to whether the app is Unicode or not, and further given that the prototypes for the two functions enforce it for everyone who explicitly chooses a version to call, it would be stupid to not base the parameter encoding on the function that is called (if there is any type mismatch the app will not even compile, and if the wrong type is passed at runtime you have either a very small string or a buffer overrun waiting to happen!).

Similarly, it would be a bad idea to base how a file being created would have its encoding chosen by what function is called, since that would mean that the act of converting your app to Unicode would break the reading of possibly important settings.

And obviously , linking the file content's encoding to what function is called is a bad idea, too, and just as likely to break conveted applications.

But a hint of what to do came out of all of these principles of what not to do. I mean, if a file is already Unicode then writing anything but Unicode would be stupid. And if a file is already ANSI, it would be idiotic to write anything but ANSI. And if the file does not exist then the most common choice can be made.

So the plan for the behavior was easy:

Keep the parameters and the content separate;
If the file being written to is a Unicode file, then write Unicode, and
If the file being written to is either ANSI or non-existent, to be created, then write ANSI.

Then all they had to do was document the behavior so people would know about it!

Of course, that never really seems to have happened. The text in the documentation is not entirely clear, as Jake noticed. The WritePrivateProfileString docs say:

lpFileName

[in] The name of the initialization file.

If the file was created using Unicode characters, the function writes Unicode characters to the file. Otherwise, the function writes ANSI characters.

Now I have almost no idea what this text is trying to say, but I am 100% sure that it is wrong. You can easily verify if you save a file as Unicode in Notepad and you will see it is kept that way, and you'll see if the function creates the file it never creates a Unicode file, even if the name has Unicode characters in it.

So Jake, you have your simple fix now -- create a two byte file, containing the BOM, and then let WritePrivateProfileString handle the rest.

Just for fun, you can even reverse the BOM bytes and WritePrivateProfileString will write to it as a UTF-16 BE (Big Endian) file!

Now before you fans of that other Unicode format ask -- if it is a UTF-8 file, then although it will still be a UTF-8 file when the call finishes, none of the data is written in UTF-8, so anything off of the CP_ACP will be a question mark. This is likely to be bad for the file's contents if the app, key, or value name strings contain non-ACP characters....

Makes for an interesting historical note, in any case. And a fascinating study in forensic appcompat analysis that would probably still be buried had Jake not asked the question!

(if any oldtimers are reading this and can either confirm or correct any of my suppositions, I would of course appreciate it!)

This post brought to you by ໜ (U+0edc, a.k.a. LAO HO NO)

# Dean Harding on 15 Sep 2006 3:33 AM:

I don't think the docs are "wrong", they're just worded funny.

> If the file was created using Unicode characters

This doesn't mean "if the filename contains Unicode characters" it means "if the file already contains Unicode characters" - where "already contains Unicode characters" actually means "starts with a UTF-16 BOM" (or does it? Maybe WritePrivateProfileString calls IsTextUnicode, and the BOM is just a big hint for that?)

# Raymond Chen - MSFT on 15 Sep 2006 2:06 PM:

Dean Harding is correct on all counts. It means "If the INI file already exists and appears to be Unicode text" and the way the code determines this is through our favorite dodgy API - IsTextUnicode. (The BOM serves as a big hint.)

# Michael S. Kaplan on 15 Sep 2006 2:24 PM:

But how clear is the text, really? And how much sense does it make to put it in the text describing the parameter rather than more general remarks?

I am willing to say that it must just be me and that to everyone else the topic is entirely clear, if that's what people think. But is it really true that (given how it all works) the text could not be improved at all? :-)

# Raymond Chen - MSFT on 15 Sep 2006 4:35 PM:

The text is misplaced. It should be in the Remarks, not in the parameters.

Mohsen Farjami on 19 Nov 2011 10:42 PM:

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

go to newer or older post, or back to index or month or day