by Michael S. Kaplan, published on 2008/08/22 03:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2008/08/22/8887005.aspx
Long time readers may recall blogs from the past like Is Excel CSV misusing NLS functionality? and Excel to Led Zeppelin -- No 'in through the out door'.
Yet another CSV weirdness that I thought might be of interest to SiaO readers came up the other day:
Hello,
I am trying to write some data that are Unicode into a CSV file, but I am running into some encoding issue, Excel cannot open the file.
I am not using any transform, I am just reading some Unicode value from an XML file and write to a file by applying the comma delimited file format.
I was wondering if somebody would have some experience and would know how to create a Unicode CSV file in Vbscript.
So far I can only save my file as ANSI.
Thank you very much.
Regular SiaO hero Paul Dempsey pointed out one suggestion for a workaround:
A quick read on Scripting.FileSystemObject tells me I can do this:
Dim fso, MyFile
Set fso = CreateObject("Scripting.FileSystemObject")
Set MyFile = fso.CreateTextFile("file.csv", True, True)
MyFile.WriteLine "v1Ⓐ,v2Ⓑ,v3Ⓒ"
MyFile.Close
And one of my favorite support engineers Malcolm Stewart (going back at least a decade!) pointed out the main issue that led to the original problem:
Excel has a bug where it won't parse UNICODE CSV files that are comma delimited, though they will parse TAB delimited files. We have filed a bug with them on this before, but they have rejected the fix.
Now on the one hand this bothers me, given the whole point of CSV is Comma Separated Values and all.
But then again it seems like you can't swing a cat around here without running into problems with CSV from the point of view of applications that will respect under preferences.
In such a case, the functionality limitation of a product that leads to a misuse of language such as "Unicode CSV values cannot be comma separated" is not only kind of amusing.
Though it is obviously that.
Perhaps even a bit ironic if one ignores past behavior -- certainly it would not be expected by most people to be the case.
But in its own way, this kind of bug is being consistent with the existing pattern!
After all, if we are
is there a compelling precedent to respect the use of language in general? :-)
This blog brought to you by U (U+0055, aka LATIN CAPITAL LETTER U)
# Andrew West on 22 Aug 2008 11:49 AM:
The other lame Unicode feature of Excel 2007 that I discovered the other day is that if you have some supra-BMP data, when you edit it in the formula bar it is treated as string of 16-bit characters, so that two arrow presses are required to get past each character, and you can select and manipulate the individual high and low surrogates that make up a supra-BMP character, leaving unpaired high or low surrogates in the formula bar and on the clipboard if you are not careful.
# Micah Dembinsky on 22 Aug 2008 12:08 PM:
Perfect timing on the post. I had been struggling with this issue that seems to defy logic and was just about to place a call to MSDN support.