My Spidey senses blame the rogue text editor

by Michael S. Kaplan, published on 2008/05/16 10:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2008/05/16/8511203.aspx


Via the Contact link, someone with the handle of BP asked me:

How come MSKLC source files break if you edit them in a text editor although they are plain text files? At least deadkeys would be much easier to edit that way...

Regards,
/BP

My first reaction was to say that the claim wasn't true and that I have given instructions on how to work outside of MSKLC.EXE for specific features in posts like the following:

some of which directly advocate and describe the process of modifying .KLC files.

So of course it works! :-)

But the parser of these files is very unforgiving.

Well, it is forgiving in the sense that it will not crash if it has trouble with parsing the file. But it will fail the building operation if anything was modified in such a way as to make the parsing work not as well.

Believe it or not, this is almost identical to the format that was first used to build keyboards in Windows (NT-based versions) over a decade ago. And it was just as finicky then (and there was no tool to build the files either; they were all hand-generated, usually in Notepad!).

You can even look at the files in Notepad if you are curious. They are text files, Unicode text files in UTF-16 LE encoding scheme with a BOM.

Now BP never said what text editor was being used, but this might be the source of his problems, actually. Because the file is opened with a simple call to the StreamReader(string) overload where no Encoding parameter is there. Guessing is done based on whether the BOM is there, and little else (the tool will open but not save non-Unicode files too, but there is no code in the tool that handles this case so it definitely all in the StreamReader).

(I don't know if this is documented or not, but it has been working for every version of MSKLC so far without requiring the StreamReader(string, bool) overload to be called, so I assume the one with just the file name calls this other one setting that parameter to true. YMMV though I tend to doubt it...).

So, the primary guess of my psychic powers is that the "text editor" is not saving the file out with a BOM, with a secondary guess that a change that breaks the format is happening....

This blog brought to you by(U+2328, aka KEYBOARD)


Michael Madsen on 16 May 2008 12:44 PM:

Just for reference, using Reflector shows that StreamReader(string) does indeed call StreamReader(string, true), which in turn calls StreamReader(string, Encoding.UTF8, true, 0x400) (.NET 2.0, .NET 1.1 does the same, but without the detour through StreamReader(string, bool)).

StreamReader will use UTF-8 as a fallback if there's no BOM, due to the second parameter in the "final" call.

Wyatt on 16 May 2008 2:26 PM:

You mean UTF-16, not UTF-6, right?  Or is there a 6 bit encoding that I'm not aware of?

Michael S. Kaplan on 16 May 2008 3:43 PM:

Yes, I meant UTF-16. :-)


Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

referenced by

2008/05/19 Everyone seems averse to the BOM these days; Should we blame TSA? :-)

go to newer or older post, or back to index or month or day