by Michael S. Kaplan, published on 2006/01/19 06:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/01/19/514718.aspx
Earlier today in the post Just when you think you know a function... I talked about the secret way to use two U+200f (RIGHT-TO-LEFT MARK) characters in the MessageBox function to put MB_RTLREADING flag behavior in the hands of localizers, where it may often belong.
While I was talking to people about that post, I got a question about what U+200f was for when it was being used correctly, and what made me so sure that it was not dangerous to put two in a row that way.
I figured I should answer that question (since several native speakers of bidrectional languages helped give me some information too!).
The easiest way to explain it is to first look at how I talked about reading order in the post Sticky Keys vs. Reading Order. Basically, this 'Reading Order' setting allows you set the context for the text before you even type it. It is a non-destructive (in the sense that it does not alter the text in a harmful way) and easily changeable.
Then, you start typing. And now we will look at Unicode Standard Annex #9 - The Bidirectional Algorithm. It talks about how characters all have a Bidi class that can say what directionality it has (if any) and how strong that directionality is.
Now most letters have what is known as a strong directionality, but the strength is very local and has very little effect on anything but the characters right around it. And this is where U+200e and U+200f come in -- they are just as strong (but no stronger) than one of those letters might be (Left- to-Right and Right-to-Left, respectively). As UAX #9 says:
2.4 Implicit Directional Marks
These characters are very light-weight codes. They act exactly like right-to-left or left-to-right characters, except that they do not display or have any other semantic effect. Their use is generally more convenient than the explicit embeddings or overrides since their scope is much more local.
RLM
Right-to-Left Mark Right-to-left zero-width character LRM
Left-to-Right Mark Left-to-right zero-width character There is no special mention of the implicit directional marks in the following algorithm. That is because their effect on bidirectional ordering is exactly the same as a corresponding strong directional character; the only difference is that they do not appear in the display.
In fact the only difference between them and the letters is that LRM and RLM are not visible -- so two in a row has no more effect than two letters in a row -- which is to say none of any significance.
And as More on cursor support: the rest of the answer certainly showed, even a misplaced LRM, RLM, or random letter with strong directionality will not convince any character with strong directionality to change its stripes. The only characters that have anything to fear are the weaker characters, though as the UAX #9 indicates those do exist. So it makes sense to put them in when you want to give an extra hint if you are not as sure of the context.
I'll talk more about that how functions use (and perhaps misuse?) this functionality soon....
This post brought to you by U+200e, LEFT-TO-RIGHT MARKER
# Nick Lamb on 19 Jan 2006 7:39 AM:
# Michael S. Kaplan on 19 Jan 2006 7:43 AM:
# Nick Lamb on 19 Jan 2006 11:20 AM:
# Michael S. Kaplan on 19 Jan 2006 11:30 AM:
# Nick Lamb on 19 Jan 2006 12:18 PM:
# Michael S. Kaplan on 19 Jan 2006 12:53 PM:
# Michael S. Kaplan on 19 Jan 2006 6:58 PM:
# Michael S. Kaplan on 19 Jan 2006 8:02 PM:
referenced by
2007/01/06 Mixing it up with bidirectional text
2006/02/19 Making a mark in code windows
2006/01/21 Return of the Mark