The mythical nature of bidirectional support, and where the wheels come off the wagon

by Michael S. Kaplan, published on 2008/04/18 10:01 -04:00, original URI:

Content of Michael Kaplan's personal blog not approved by Microsoft (see disclaimer)!
Regular readers should keep in mind that all I said in The End? still applies; the allusion to the X-Files continues for people who understand such references....

The problem has its roots in Mixing it up with bidirectional text and The Bug(s) Spotted, aka Design flaws are worse than bugs, two blog entries which talk about specific lamenesses with the bidirectional support within Windows.

I don't want to imply that there aren't more problems beyond these. Because to be perfectly honest, there are.

Microsoft is incredibly lame here, though to be frank for a moment only lame in a way that everyone else is too, right now. Including Unicode.

To illustrate, I'll need a sample bit of text.

Let's build up a path. :-)

We'll take a nice little English string:


And then we'll make another one in Hebrew, kind of a localized version of that string.

שם ‏(גדול)‏

It is really quite reasonable to hope one could take these chunks, create a path with them (one chunk per directory) and have everything come out right.

I mean a path like:

C:\NAME ‎(BIG)‎\שם ‏(גדול)‏\NAME ‎(BIG)‎\שם ‏(גדול)‏

may be a Destryian scenario, but at its root it's just a small valid scenario that you would really want to work.

Let's try it with no special decorative control characters and leave it to the whim of your browser:

C:\NAME (BIG)\שם (גדול)\NAME (BIG)\שם (גדול)

It didn't look right on all four that I tried (Safari, FireFix, Opera, and Internet Explorer).

How about in Notepad?

Well you can choose your means of failure there via the right-click menu:


Let's try it on the latest and greatest version of Windows, as a path:

Hmmmm. Not so great in the breadcrumb bar, huh? What if we click in the address bar space to get rid of the breadcrumb bar:

Still broken, those tokens. All of the English ones look fine, but the Hebrew ones are broken.

Maybe we can do better on a Hebrew user interface language.

We'll look at the breadcrumb bar again:

Well, good news and bad news here -- the Hebrew looks good now, but the English is broken!

Is the hope for

C:\NAME ‎(BIG)‎\שם ‏(גדול)‏\NAME ‎(BIG)‎\שם ‏(גדול)‏

such a fruitless one? So very unreasonable?

Turns out that if you are running on Windows, it is. :-(

Now obviously you can do some work here with U+200e (LEFT-TO-RIGHT MARK) and U+200f (RIGHT-TO-LEFT MARK) or other Bidi control characters to try and make this better, but obviously this is something one wants to have happening behind the scenes without requiring the user to add control characters to the string.

Especially a string where the intent is so obvious and easy to discern.... a slightly more complicated case than the one in Mixing it up with bidirectional text but not all that much more complicated, is it?

But it is by no means an easy problem for users to have to solve. so it really would be much better if the OS could do the heavy lifting here, rather than forcing it on everyone else.

Which is not to say there is some other operating system that magically does everything right here. Last time I checked, no one was doing so well in this space, and bidirectional support in these edge cases is kind of a myth for now....

Let's pause to do a little RCA (Root Cause Analysis) for the problems here -- that as a standard, the Bidirectional Algorithm is several levels lower than one needs to handle the mix of LTR and RTL scripts, and the various "clients" who more or less support the standard (be they application or operating system or browser or other) but do not provide a whole lot beyond it (other than sometimes providing that notion of a higher level definition of default directionality). It does quite well with cases like Hebrew that actually have some LTR pieces within themselves, but there is no good way to handle other script LTR text embedded within unless a bunch of other work happens. Work that no one really wants to provide. Remember what that one person said in response to that hack bug:

"The correct fix is to delete the test entirely. We are all-Unicode now. We don't need an old hack for Hebrew/Arabic Windows 95."

No one wants to do too much beyond Unicode even though plain Unicode alone (without making use of higher level protocols to place control characters) is insufficient for handling these cases....

Note that is also also one of the reasons RTL IDN is so complicated and looks so broken most of the time.

It all amounts to A place where everyone blows, equally.


This blog brought to you by U+200e and U+200f (aka LEFT-TO-RIGHT MARK and RIGHT-TO-LEFT MARK)

# Andrew Cook on 18 Apr 2008 1:59 PM:

At first I didn't quite get what the problem was until I loaded the Blog in Firefox.

MS Office Outlook 2007: "RTL what?"

# Michael S. Kaplan on 18 Apr 2008 2:26 PM:

Whole new meaning to black box testing, huh? :-)

# Jan Kučera on 23 Apr 2008 5:24 AM:

So when I want to store bidi in database, should I add the LTR/RTL marks correctly or just leave it as it is and hope the rendering system will get it eventually right... in (far) future?

# Jan Kučera on 23 Apr 2008 5:26 AM:

By the way, I was reading this and the next post in Pocket IE and although it just showed rectangles, it was in correct order.. :)

From some point of view, less is more...

# Michael S. Kaplan on 23 Apr 2008 11:00 AM:

Whether to add marks or rely on marks to be added for rendering purpose eventually becomes a personal choice, based on one's own cynical nature of how long that is expected to take. :-)

referenced by

2010/09/20 If you are Persian, you may not always want to follow your Parent

2010/07/23 It used to be Windows doing it right, and Office following. But now...

2008/08/25 The Bidi Algorithm's own SEP Field

2008/06/10 When Unicode's "PDF" character isn't supported, you really *can* say that the app's Bidi support doesn't POP!

2008/04/19 Even if the text is right underneath, it may look wrong close up....

go to newer or older post, or back to index or month or day