by Michael S. Kaplan, published on 2006/09/24 15:40 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/09/24/769540.aspx
A recent question I received via email from a colleague who preferred to remain anonymous on the blog:
Hope everything is going well with you first of all...
May I ask for your help on an NTFS technical question? I'm currently involved in some CIFS/NTFS compatibility related issue discussion and wondering what would be the first Windows release that supported UTF-16 and characters of beyond the BMP area?
Based on the http://en.wikipedia.org/wiki/NTFS, it is Windows 2000 but I'm not quite sure if that's official or really correct.
Would you please let me know if you have the info handy or point me to one of the public documents available at Microsoft web sites? (I was trying to do web search but I wasn't really able to find the info from www.microsoft.com...)
Thanks very much in advance for your help and hope this isn't a trade secret that I'm asking for...
Well, since as far as I know I don't know any trade secrets about NTFS, we are probably safe on that count, at least! Just to make sure, I'll stick to stuff that anyone can verify themselves if they want.... :-)
Of course there is the info I just put up in this blog post for starters, and I'll go a step further and make it clear that you can use high surrogate and low surrogate code units in NT even before they were actualy defined (since none of the current or past incarnations of NT disallow unassigned code points).
The Wikipedia article is really quite misleading on this score with its text:
File names are stored in Unicode (encoded as UTF-16, although limited to the Basic Multilingual Plane in early versions before Windows 2000).
Well, I'll point out that whoever wrote this bit either confused NTFS with Active Directory (which is actually limited on this point until Windows XP/Server 2003 which is when surogate code units first received weight) or they simply don't understand NTFS and did not test creating such files on NT 4.0 or earlier.
In my ideal world, a future version of NTFS would actually (optionally) take into account both characters defined in Unicode and also Unicode normalization, but as far as I know there isn't anyone planning such a thing yet.
So if I absolutely had to describe NTFS in terms of a Unicode version, I'd say it uses a very early version of Unicode and it assumes that anything it believes to be unassigned code points it allows for forward compatibility. :-)
This post brought to you by / (U+002f, a.k.a. SOLIDUS)
# Carl on 24 Sep 2006 4:59 PM:
# Michael S. Kaplan on 24 Sep 2006 5:22 PM:
# Michael Dunn_ on 25 Sep 2006 4:22 AM:
# Michael S. Kaplan on 25 Sep 2006 4:59 AM:
# Sergei on 25 Sep 2006 9:13 AM:
# Michael S. Kaplan on 25 Sep 2006 11:05 AM:
# WikiServerGuy on 25 Sep 2006 4:43 PM:
# Michael S. Kaplan on 25 Sep 2006 5:21 PM: