by Michael S. Kaplan, published on 2005/02/20 20:51 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/02/20/377116.aspx
Jason (an SDE/T somewhere in Windows) posed the following question yesterday afternoon:
I am writing a script to test localized Windows qfe package INFs. These are stored as ANSI files. I am using a Unicode XML file for storing my comparison strings. My script must run on localized windows builds of various default codepages which affects how the string comes out when I read it from the ANSI file. The test will run fine for all languages when run on an English box but when run on say a Chinese box the string comes out differently and my test breaks unnecessarily.
Is there a way to always get the same string out of this file?
Obviously this is a problem. VBScript mostly assumes the default code page and the File System Object assumes that code page, UTF-8, or UTF-16 depending on how you set it. In this case, Jason is looking for code page 1252 to always be used.
Incidentally, as I hinted at in 'How does it detect invalid characters?' it will sometimes be able to fail on other code pages, as well (basically any time there are slots that do not have a mapping). But I did suggest a workaround:
Maybe there is something clever you could do with ADODB.Stream, its LoadFromFile method, its Charset property, and its ReadText method?
I did not intend to be mysterious, I just was not sure here. I vaguely remembered someone suggested using ADODB.Stream in a similar situation and did not want to over-promise a solution. But sure enough he posted back today that it worked!:
I’ve included my code below in case anyone else wants to see how it’s done. It’s quite simple once you know which object to use. The code below will load an ANSI file to the same character set displayed in EN notepad, allowing me to copy and paste the characters from Notepad into my Unicode test data file and always read the same thing from the INF no matter what language I am running on:
' load using windows-1252 character set
dim oStr, WorkingBuffer
set oStr = CreateObject("ADODB.Stream")
oStr.CharSet = "windows-1252" ' code page of the inf files
oStr.Open
oStr.LoadFromFile FileName
WorkingBuffer = oStr.ReadText
set oStr = nothingI never would have thought to use ADODB objects for this :P
After he posted that, I went to find the reference in my archives, something made surprisingly easy by the fact that I pretty much never deal with ADODB streams for any other purpose. It was David Copenhaver in the microsoft.public.vb.general.discussion newsgroup, who posted the following code:
Private sub t2UTF (Path as string)
Dim bob As ADODB.Stream
Set bob = New ADODB.Stream
bob.Open
bob.LoadFromFile Path 'Loads a File
bob.Charset = "UTF-8" 'sets the stream encoding to UTF-8
bob.SaveToFile Path, adSaveCreateOverWrite 'Save File
bob.Close
set bob = nothing
end sub
So, sorry to make you figure it out yourself, Jason (I should have looked in the archives first!).
The obvious question at this point would be to wonder why they are using a String for Charset property when what is being dealt with is code pages. I honestly have no clue, but I'll give them the benefit of the doubt and assume it is to ease the ability to use the object with HTML files and their charset property.
It is amazing where you can find support for international features....
This post brought to you by "𐐐" (U+10410, a.k.a. DESERET CAPITAL LETTER H)
# Dean Harding on 20 Feb 2005 7:51 PM:
# Michael Kaplan on 20 Feb 2005 7:54 PM: