Putting the camel's nose in Building 24

by Michael S. Kaplan, published on 2007/06/07 14:24 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2007/06/07/3146122.aspx


There is an old expression, usually attributed to an Arab (without citation) that goes something like:

Do not allow a camel to put his nose under the edge of your tent, for soon you will have a camel in your tent.

As a phrase it is appealing since it suggests a phenomenon that we all know of, whether we call it "give them an inch, they'll take a mile" or mutter about "slippery slopes".

Of course never having been up close and personal near a camel for any length of time, I can't speak to the veracity of the phrase, and whether it is true or not. It does seem a bit unfair to camels, and when you really think about the whole situation it may not even sound all that likely.

And yet when a situation happened to me just recently I was thinking of neither slopes that were slippery nor mile taking/inch giving. I had an image of that camel that Steve Zahn was riding in the movie Sahara, somehow nosing his way into Steve's trailer between takes because Steve let it stick its nose in.

The situation?

Well, it was that whole issue with Romania, the one I talked about previously.

Originally people were quite happy with the final results of the update for Romanian and Bulgarian in the core fonts.

The request was made, it was met, then they asked for a bit more, and then that was met. Then Office had yet another font and the request was met there too.

The camel was much further into building 24 (the Windows International building) by now than just its nose, clearly. But the lobby had enough room so it was no big deal, I guess. At least we were helping some customers, and people seemed pretty happy.

But it didn't take long before they noticed some of the problems trying to use these new letters if they were running in XP. Like in SQL Server. Or Jet. Or Access. Or the various other Office applications.

As predicted in Be careful what you wish for (just in case it comes true!) aka When a Cedilla needs to be a Comma Below (and vice versa), people recognized (without necessarily understanding the base cause) that these characters don't work quite right on downlevel platforms.

As you probably know if you are a regular reader, it is yet another case of the jury giving at least part of a string no weight, the part involving these characters.

Now in Vista in an effort to make the old equal the new, the following equivalences were made for sorting:

ș (U+0219, LATIN SMALL LETTER S WITH COMMA BELOW) ≡ ş (U+015f, LATIN SMALL LETTER S WITH CEDILLA)

Ș (U+0218, LATIN CAPITAL LETTER S WITH COMMA BELOW) ≡ Ş (U+015e, LATIN CAPITAL LETTER S WITH CEDILLA)

ț (U+021b, LATIN SMALL LETTER T WITH COMMA BELOW) ≡ ţ (U+0163, LATIN SMALL LETTER T WITH CEDILLA)

Ț (U+021a, LATIN CAPITAL LETTER T WITH COMMA BELOW) ≡ Ţ (U+0162, LATIN CAPITAL LETTER T WITH CEDILLA) 

Which is great for Vista, sure.

But what to do with the downlevel case?

Well, luckily there is an interesting workaround which will give answers that, while not being perfect in XP, will lead to perfect results on Vista. The hint for what to do comes from If you decompose those city elders, you might be able to sort them out!, and it is because of the other set of equivalences that exist in Vista:

ș (U+0219, LATIN SMALL LETTER S WITH COMMA BELOW) ≡  (U+0073 U+0326, LATIN SMALL LETTER S + COMBINING COMMA BELOW)

Ș (U+0218, LATIN CAPITAL LETTER S WITH COMMA BELOW) ≡ (U+0053 U+0326, LATIN CAPITAL LETTER S + COMBINING COMMA BELOW)

ț (U+021b, LATIN SMALL LETTER T WITH COMMA BELOW) ≡   (U+0074 U+0326, LATIN SMALL LETTER T + COMBINING COMMA BELOW)

Ț (U+021a, LATIN CAPITAL LETTER T WITH COMMA BELOW)  -->  (U+0054 U+0326, LATIN CAPITAL LETTER T + COMBINING COMMA BELOW)

If instead of using the first set of items in these four canonical equivalences one uses the second set, one will get results that will be correct in Vista and almost correct downlevel. Which is a much better situation (especially since it gives people a very clear upgrade path).

If you have Vista, you could even use MSKLC 1.4 to create your own custom version of any or all of the three Romanian keyboard layouts to use these decomposed forms -- and then install them on your XP machine too if you want (installing them on your Vista machine will help with the documents you create in Vista and send to people running XP).

So now, with my succession of blog entries on Romanian (with that one Every Character Has a Story post still pending!) the camel is way inside of Building 24 but at least there is a way to feed it even when it is still using an XP bridle (do camel's have bridles? I can probably let the analogy stand, in any case).

 

This post brought to you by ̦ (U+0326, a.k.a. COMBINING COMMA BELOW)


Cristian Secară on 7 Jun 2007 7:30 PM:

Hm.

Maybe if, back in 1987, when the ISO-8859-2 standard was born, Romania was either represented by qualified representatives (I suspect there weren't, either qualified, or not at all present), perhaps no comma-vs-cedilla-under-s-and-t issue ever existed.

Maybe if, back in 1998, when the ISO-8859-2 standard was revised, [international] peoples were listening to Romanian representatives who asked for a correction/solution (either by eliminating the reference to the Romanian language from that standard, or by modifying the standard), perhaps the comma-vs-cedilla-under-s-and-t issue was now already been solved somehow (at that time there was no need to downlevel to anything, since no true Romanian data existed in large amount).

Maybe if, sometimes during 1999 and 2001, Microsoft Romania would have properly informed about all those things, including about a fresh – at that time – Romanian keyboard standard which preceded the recent revised one, perhaps today there was no need for the camel to put his nose under the edge of the tent *).

But this is history. What now ?

On the other hand I didn't know that the keyboard layout driver (the kbd*.dll) allows for combining characters in one keystroke. Is that possible ?

*) we also have something similar, it sounds approximately like "if you give [for help] a finger to someone, then he will take all your hand" :)

Cristi

oresama on 7 Jun 2007 8:33 PM:

You could try to point the camel's nose towards the door.

Japanese has an old expression "pointing the horse's nose" (餞, uma no hanamuke). When one is about to depart on a trip, you point the horse's nose towards the destination in hopes of a safe journey. In a related sense, it can also be interpreted as seeing someone off.

Michael S. Kaplan on 7 Jun 2007 9:12 PM:

Hey Cristi --- you can put up to FOUR UTF-16 code values in one keystroke. So two is easy. :-)

Your comment hints at 12% of the issues in that Every character has a story post that is still pending!

Mihai on 8 Jun 2007 12:51 PM:

My personal guess is that 8859-2 also tried to cover Turkish (which uses s with cedilla). When discovered that in fact 8859-2 is not enough for Turkey, they got their own iso code page (iso 8859-9 in 1989), and Romanian was left with the cedilla form. But this is my own guess :-)

And let's not forget that in '87 Romania was still behind the iron curtain, so it was probably not represented at all. Or, if represented, it was definitely not based on qualifications.

Mihai on 8 Jun 2007 12:53 PM:

Since the article is about Romanian, let's forget the camel.

You can use a Romanian saying: "offer one a finger, and he will take the whole hand"

(now, the exact saying starts with "give one a finger," but in English this has connotations that are missing in Romanian :-)

ddebug on 5 Aug 2007 5:46 PM:

A great proverb,  yeah.  Romanians understand it perfectly,  and Japanese brought a very practical conclusion:  turn the animal's nose OUT of the tent, and show him the way!


referenced by

2011/08/24 The history of messing up Romanian on computers

2007/12/07 If it isn't really Tibetan, could it pinch hit for Burmese?

go to newer or older post, or back to index or month or day