It used to be Windows doing it right, and Office following. But now...

by Michael S. Kaplan, published on 2010/07/23 07:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2010/07/23/10041679.aspx


I think I may be getting old.

More and more, I find myself responding to various engineering situations at Microsoft where I am saying (or at least thinking) "When I was doing that work...", if you know what I mean.

Like for example, there was a time when Cathy and I were on the NLS team and we were both heavily involved in Unicode.

Windows pretty much set the trends, did the work that other parts of Microsoft would then pick up.

Now over the last several years, some of which were happening while I was still on that team and some of which happened later, I have been blogging about the terrible job that both Microsoft and Unicode have been doing on core aspects of real world use of the Unicode Bidirectional Algorithm. Like all of these and more:

As I said previously, the simple problem is best stated as:

The Unicode Bidirectional Algorithm cannot handle text from both left-to-right and right-to-left languages together in the same line of text.

That is it, right there.

And since in the real world this scenario and its underlying problem (all of the neutral character type punctuation charactersare so often there), this is not a theoretical problem; it is a real one.

Now Unicode's Bidi Algoritm kind of wasn't very good in this scenario.

And Windows was equally not good, though with the excuse that they were conformant to the UBA.

Which we knew kind of sucked here.

And of course Office, which uses the Uniscribe component from Windows, would (as an application suite downlevel of Windows), sucked a bit too -- but was conformant to the UBA and doing the work to support what Windows did (or didn't).

Folks in Windows were comfortable with this, knowing that when Office did their own thing that had incomplete stories like the one in Oh (Saka to me, Saka to me, Saka to me, Saka to me) Whoa Babe (Just a little bit) A little respect (just a little bit).

Then, for Office 2007, one could say that they looked at that Bidi situation and decided enough was enough.

Even if Windows was okay with the idea of sucking with a good excuse (Unicode conformance), they decided to say screw that....

Murray Sargent described some of what they did in blogs like Tailoring the Unicode Bidi Algorithm and Bidi Paragraph with Parenthesized Text. Basically they tackled this huge scenario in a way that neither Windows nor Unicode were doing so well in.

And people have been noticing the difference, e.g. blogs like It's a bug, it's always been a bug. In either direction....and how Office and many of its applications and components now do it correctly, while most of Windows and .Net do not.

This all shipped in both Office 2007 and Office 2010. And was in beta before Vista shipped.

And at the most recent Unicode Technical Committee meeting, Murray Sargent brought up the problem with Unicode and suggested that they make a change to describe this exact option to better support real world usages of bidirectional text.

They were very interested, and are looking forward to the fully written up proposal at the next UTC meeting.

Soon enough, Microsoft Office and RichEdit and Unicode will be working correctly in this scenario.

And Uniscribe, GDI+, WPF, DWrite, Silverlight, and everyone else won't.

In fact, all of the above is true right now, except for Unicode.

The excuse of these components is that they had to stay conformant to the UBA.

I wonder what their response will be when Unicode is updated too?

I remember back when Windows was pretty much the one doing the right thing in such circumstances.

Now (less than a half a decade later)?


carlos on 23 Jul 2010 7:58 AM:

In your bulleted list of four links near the start of the post, the first and third links point to the post editor rather than the posts.

Whoops! Fixed now....


referenced by

2010/08/08 Download languages for Windows

go to newer or older post, or back to index or month or day