Getting the length in bytes

by Michael S. Kaplan, published on 2006/06/23 02:35 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/06/22/643781.aspx


Overheard in microsoft.public.win32.programmer.inernational:

Hi Guys,

Sorry for posting this. I have gone through lots of postings here and unable to get an answer. Here is my problem.

I am using Windows XP Pro English.

I have a VB dll that passes strings to COM DLL (written in VC++). VC DLL converts the string to ANSI before writing to a file. The VC DLL has function called AddData that expects 2 parameters - One is the string to write and second parameter is length of the string.

Sample Code -
s="Write this to file"
MyFile.AddData(s,len(s))

Result
Contents of file - "Write this to file"

Well, everything was fine till the application was supposed to write chinese characters as well. I first tried to do the same on my machine and could not generated any chinese characters (only ???). I was then asked to configure my machine to display chinese characters (the way chinese guys are doing it). This is what I did -

1. Install East Asian language support.
2. Added Chinese (PRC) language to input/keyboard language (With US Keyborad layout being default)
3. Changed the non-unicode language from default to Chinese (PRC)

Sample code -
s="Write 测试this to file"
MyFile.AddData(s,len(s))

Result
Contents of file - "Write 测试this to fi"

The number of characters lost are same as number of Chinese characters in the string.

My Analysis -
Len returns the number of characters in a string and not the number bytes that will be written to the file.

Limitations
I cannot change the VC COM dll as we do not own it.

My questions -
1. Why is the string length not reported correctly?
2. How can I get the actual length of the string?

Cheers
Siva

The problem here is that strings in VB are Unicode, and the Len() function returns a count of UTF-16 code units in that Unicode string.

To work around this, if you want the length in bytes of the string in the default system code page, you need to try something different, such as:

s = "Write 测试this to file"
MyFile.AddData(s, LenB(StrConv(s, vbFromUnicode))

This will convert the string using the default system locale's code page, and pass the length of that string in bytes to the AddDate method....

Note the dependence on the default system locale, which will not be correct if the conversion being done in the VC++ COM DLL is not CP_ACP, the default system code page. In that case, you just need to make sure that the same code page is used in both the VB and the VC++ components....

 

This post brought to you by (U+6d4b, a CJK Ideograph)


no comments

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

go to newer or older post, or back to index or month or day