by Michael S. Kaplan, published on 2006/09/10 11:53 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/09/10/748699.aspx
So, the question that Dixon asked was:
Can you tell me how window XP encoding its filename/directory name? Is it already UTF-8?
(I assume we are talking about NTFS here)
It is definitely not in UTF-8.
Furthermore, it is not in UCS-2, since you can have a filename with a supplementary character in it.
And it isn't in UTF-16, since it allows any sequence of unsigned short values which are not limited to valid Unicode characters and
So in one sense you could call it UTF-16 Plus since it basically adds a whole bunch of characters, though it is obviously less cool than actually using UTF-16 so perhaps it would be better to think of it as more of a UTF-16 Minus?
Or even better we just keep in mind that it isn't really a true Unicode encoding, just one that supports a lot of Unicode's characteristics and features and properties, while not really having a larger understanding of it....
This post brought to you by 𐒅 (U+10485, a.k.a. OSMANYA LETTER KHA)
# Adam on 10 Sep 2006 6:54 PM:
# Michael S. Kaplan on 10 Sep 2006 8:53 PM:
# Tom Gewecke on 11 Sep 2006 11:23 AM:
# Michael S. Kaplan on 11 Sep 2006 12:32 PM:
# Tom Gewecke on 11 Sep 2006 2:09 PM:
# Matt Seitz on 30 Nov 2006 2:47 AM:
If NTFS allows any unsigned short, would it be more accurate to say that NTFS does not do any encoding? Should one instead say it is the Win32 subsystem which encodes and decodes characters as UTF-16, and then stores them in and reads them from the raw NTFS file name buffer?
# Michael S. Kaplan on 30 Nov 2006 3:34 AM:
Well, perhaps. I doubt I could convince anyone to update the documentation to say it that way, though. :-)
referenced by
2006/12/05 Validation of Unicode text is growing up
2006/09/24 NTFS and Unicode?