Not everyone does the right thing for Romanian

by Michael S. Kaplan, published on 2007/05/01 22:59 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2007/05/01/2367135.aspx

Regular reader Cristian Secară asks:

Hi Michael,

Maybe you have a clue on a specific characters ș & ț issue ...

Romanian Windows Vista users noticed that when trying to type messages in Yahoo Messenger using (also) characters ș and/or ț (you know, those infamous 0x0218 ... 0x021B), a question mark is displayed instead (instead of each ș and/or ț characters, not instead the whole text). If, instead of typing directly, one writes the same thing into other Unicode app, let's say Wordpad, by copying & paste the text into YM, now the text is correct (everything, including sending and displaying on the receiving system).

Until recently I thought this is a Yahoo Messenger issue, but now I am no longer convinced about that. I found more applications with the same behaviour: NetObjects Fusion (checked by myself), Pegasus Mail and Skype (reported by others).

Sidenote: I am using some applications that were ported from the Linux world, applications running on top of GTK+ (for example Sylpheed e-mail client). On all those applications, when running in win32 environment, a menu bar which contains strings containing one of characters ș and/or ț will either display a blank menu bar, or will display a question mark in place of respective ș and/or ț characters. The same applications will behave correctly when running in native environment systems.

I have no idea if there is any relation between the two situations described. Giving the fact that in WinXP and Win2000, inspite of the fact that both systems claims to be Unicode aware there are several known problems with s and ț characters, problems which until now were fixed only on Vista, I am wondering if there may be still some other issues as well, which are difficult to identify/find/fix.

Cristi

Sorry, Cristi -- this is an application problem. The OS supports these characters, along with the rest of Unicode -- and it is only applications that do non-Unicode things (like text input routines that limit text to code pages or other subsets) or that support fonts which do not support characters that can lead to problems here.

In the case of the newer Romanian characters, the situation requires applications that support Unicode in more than just clipboard operations; if Yahoo Messenger or NetObjects Fusion or Pegasus Mail or skype have problems here, then you can complain to the apps that are doing the wrong thing or you can just find better programs that fully support Unicode in all of their operations for your chosen language.

(Keep in mind that even though there was that update for Romanian in Arial, Times New Roman, and Verdana, that the characters were already in other fonts like Tahoma. So even without the update, one could write an application that would display these characters if one truly wanted to!)

This principle applies to Microsoft products as well -- I am not going to be an apologist here. As an example, there have been occasions on the past that I have had to honestly recommend against using SQL Server or Jet due to limitations within it, unless they were willing to go through solutions like the one I suggested in the Extending collation support series I did back in 2005.

This post brought to you by Ș (U+0218, a.k.a. LATIN CAPITAL LETTER S WITH COMMA BELOW)

Marc Durdin on 2 May 2007 7:40 AM:

Cristi, our tool 'Keyman Desktop' allows you to input Unicode characters (including these specific Romanian letters) into Yahoo Messenger and many other applications that don't support Unicode input. Have a look at: http://www.tavultesoft.com/keyman/ and http://www.tavultesoft.com/kb/index.php?KnowledgeBaseID=18

Please contact us at sales@tavultesoft.com if you'd like more information.

Alan McFarlane on 2 May 2007 10:58 AM:

Err, should this be obvious to me... Why does copy&paste work, but typing directly not? Hmm something in the UI code path isn't Unicode but the clipboard read is, or uses a Windows-X codepage?...

Michael S. Kaplan on 2 May 2007 12:48 PM:

The clipboard supports both CF_TEXT and CF_UNICODETEXT, so whatever the app asks for, it can get.

The app decides how it wants to process input though, and if takes ANSI input or if it cinverts it due to the way that the input is handled, then the app has the freedom to do what they want....

Mihai on 2 May 2007 12:55 PM:

Using "psychic debugging" (to quote Raymond) this is another case of "Double Secret ANSI"

(and this answer just duplicates Michael's explanation on ANSI input/Unicode clipboard :-)

Mihai on 2 May 2007 12:56 PM:

Now I realize another thing: it can be a "non Unicode RTF in a Unicode application" (explained here http://www.mihai-nita.net/article.php?artID=20050709b)

Michael S. Kaplan on 3 May 2007 9:22 PM:

It may not be as smart as "double secret ANSI" -- instead it might just be CP_ACP handling....

Alan McFarlane on 8 May 2007 6:04 AM:

OK, thanks. So not necessary just a simple mistake, possibly a bigger mistake too. :-,(

Alan

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

referenced by

2007/09/27 Wouldn't you bet, Fret (aka You've got 50 ways to fix your characters)

go to newer or older post, or back to index or month or day