Return of the Mark

by Michael S. Kaplan, published on 2006/01/21 03:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/01/21/515793.aspx


Well, the RIGHT-TO-LEFT MARK (and its cousin the LEFT-TO-RIGHT MARK), that is!

(apologies to those of you hoping for a clever play on words with Mark Morrison or his song 'Return of the Mack')

I talked about the former in Just when you think you know a function... and both of them in What are directional marks -- chumps who point?.

I thought I'd give another example of when code will insert these controls. Our own member of the NLS API, the GetDateFormat function!

Now whether you call it in Windows or in Windows CE, there are two flags you can pass to it that relate to the directional marks:

DATE_LTRREADING Adds marks for left-to-right reading layout. This value cannot be used with DATE_RTLREADING.
DATE_RTLREADING Adds marks for right-to-left reading layout. This value cannot be used with DATE_LTRREADING

It does not really say much about when/how those marks are added, does it?

Well, it actually litters them in there. Looking at the format string (either the one passed in the lpFormat parameter or the one gleaned from the NLS locale data), it will insert them in front of all of the following cases:

So in a long date or in a complex format you could end up inserting quite a few of them!

At this point, you may be wondering (if you have not decided to stop reading this blog entry or even the whole blog since I seem as obsessed with these invisible characters as the nutbars who obsess about U+FEFF!) why it bothers to insert so many of these individual characters rather than just use one embedding mark at the beginning (RLE/RLO or LRE/LRO) and one directional formatting pop at the end.

I asked co-worker Yaniv this very question, and he laughed, as once upon a time when he was working on the Office team they had started to do this very thing in Office when they found that on Windows code page 1255 (Hebrew) and Windows code page 1256 (Arabic) the LRM/RLM are there but the rest of the formatting codes are not. And according to Yaniv it was Steven Sinofsky himself who talked about the bad user experience in typical non-Unicode applications of having tons of question marks in the text when the RLE/RLO/LRE/LRO/PDF characters were used

(I do not know if he was a VP then or not, though I remember him answering questions on internal aliases that I had asked from time to time, so he clearly still takes interest in product issues; it may be that he was a VP when he mentioned the problem!)

So in the end, I think that is probably the right behavior -- a bunch of invisible characters trump two possibly visible ones, any day of the week. And even in a Unicode application, you never know when people will be using the clipboard to paste the text into a non-Unicode one. So it is better safe than sorry for as long as the Notepad default is ANSI....

Speaking of Notepad, I was having an email conversation with David Shiflet and Chris Walker the other day about the fact that Notepad was doing something interesting itself -- it was trying to detect when the user was in an RTL locale (more on this issue another day!) and when it thought such a case was happening on inserting a date/time value (via Edit|Time/Date... or hitting F5) it would not only pass the appropriate flag to the GetDateFormat function, it would also insert its own reading marker at the front of the format string it was passing!

I suppose this could occasionally cause a problem, say if you combine the fact that two RLM marks might be inserted with this issue and a Hebrew or Arabic localizer could accidentally mirror a MessageBox. But they were probably going to mirror it anyway so it is really a minor, hard-to-discover problem (maybe worthy of a KB article if a user ever noticed it, though to date no one has AFAIK and the code has been around since at least 1996!).

(For those who were wondering, it was indeed that conversation with David and Chris that prompted all of the research into these issues -- so everyone should consider themselves warned about the danger of asking me questions!)

Ok, I think I have now exhausted both my knowledge of and my curiousity about bidirectional reading mark trivia, so I promise to quit talking about them so much. :-)

 

This post brought to you by U+202c (a.k.a. POP DIRECTIONAL FORMATTING)


no comments

referenced by

2006/10/06 What do those marks do again?

2006/02/08 They make 'em smarter than GetDateFormat

go to newer or older post, or back to index or month or day