by Michael S. Kaplan, published on 2005/11/02 06:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/11/02/487994.aspx
The Suggestion Box seemed to be building up some topics, so I thought I would clear a few of them out here....
Back on May 15th of this year, QFlash asked me:
How to write a txt file as Unicode in .NET?
It is actually pretty straightforward.... if you create a new StreamWriter object, then one of the constructors takes both a filename and an Encoding object.
So if I want to create a UTF-16 text file named IamUnicode.txt, you can just use:
StreamWriter sw = new StreamWriter("c:\\IamUnicode.txt", Encoding.Unicode);
// Write something to the file
sw.WriteLine("something");sw.Flush();
sw.Close();
and that's it (some people do not call the Flush() method but it just feels safer to me to know that I have done it before I close, it is probsbly not needed).
Now on a bit harder of a note, Martin Kochanski asked me the following back in March:
Before Unicode was as widely used as it is now, users of languages with diacritics had to manage with ASCII (or if, they were lucky, with Latin-1) and whole dialects of character usage grew up as a result. This was especially the case with informal communications such as chats and bulletin boards.
To give the example I know best: Polish needs acute accents on c, s, and z, a dot on the z, tails under a and e, and a line through the lowercase "l", to mention just a few.
Sometimes the accents were left out when they could be inferred, and some adjustments were trivial (eg. represent acute accent with a following apostrophe) but what was really inspiring was that people worked out that some letters that weren't used in Polish, such as q, v and x, could be co-opted and given consistent meanings in Polish completely unrelated to what they normally mean in Latin scripts: thus if x equalled z-dot (I can't whether this was one of the specific equivalences) then a Polish speaker would quickly learn to read x as z-dot without hesitation and to press the x key when he wanted to type z-dot.
The spontaneous evolution of such dialect character sets (the convergent evolution resulting from a strong selection towards mutual comprehensibility) has always struck me as a rather inspiring episode, because "bottom-up", driven by need, and not created by committees. The trouble is that once the need disappears, so do the dialects. I'm hoping that someone somewhere is interested enough in the electronic equivalent of "oral history" to be able to capture and codify these ephemeral character sets before they are forgotten even by the people who used them; and it struck me that some of the people who read this blog might have an interest in this bit of history too.
Now this is a fascinating topic, but one that I have to admit I know just about nothing about. Does anyone know of a place where knowledge all of these kinds of de facto standards might be kept?
Any leads might be interesting or useful....
Another one -- Per Bergland asked just this last August:
I can't understand why after so long time there's still no support for Unicode .cmd/.bat files in cmd.exe.
Since I often use Swedish åäö in my file paths, I have to either resort to firing up edit.exe in a command prompt window (aka DOS Window) or first create a Unicode version A.txt from which I can easily create B.txt by "type"-ing it:
type A.txt >B.txt (unless of course I started the prompt using cmd /u).So it's not as if cmd.exe is totally Unicode-unaware. Why not batch file support?
The problem (well, one of the problems, as there are many) with cmd.exe is that there is a lot of backcompat fear surrounding changes to it -- because almost any change that does happen can lead to breaks.
But Pat, have you checked out Monad? It is indeed the next generation in the console, and it will support Unicode scripts....
I do not know of any plans to rev. cmd.exe in Vista to support this though; it has to keep running as is, but a major feature like this one is simply a bit too much, I think.
This post brought to you by "আ" (U+0986, a.k.a. BENGALI LETTER AA)
# Heath Stewart on 2 Nov 2005 12:14 PM:
# Michael S. Kaplan on 2 Nov 2005 1:10 PM:
# Marvin on 2 Nov 2005 1:40 PM:
# Jonathan Wilson on 2 Nov 2005 6:48 PM:
# Michael S. Kaplan on 2 Nov 2005 7:16 PM:
# Serge Wautier on 3 Nov 2005 2:15 AM:
# Suzanne McCarthy on 6 Nov 2005 12:08 AM:
# Per Bergland on 8 Nov 2005 5:37 PM:
# Michael S. Kaplan on 8 Nov 2005 6:01 PM: