by Michael S. Kaplan, published on 2012/01/16 07:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2012/01/16/10256752.aspx
So, the other day, I wrote How to detect if a locale is Bidi, Windows 7/8 edition.
This is a topic I had covered a bunch of times over the years, in many prior blogs, from How To [NOT] detect that a locale is bidi to How To detect that a culture is bidi to Cue the smarter version of GetDateFormat... ok, it's a wrap! and so on.
Most of the Win32 answer prior to the introduction of LOCALE_IREADINGLAYOUT was using the LOCALESIGNATURE.
More specifically, bits 123, 124, and 125 of the Unicode Subset Bitfields:
Bit | Meaning |
123 | Windows 2000 and later: Layout progress, horizontal from right to left |
124 | Windows 2000 and later: Layout progress, vertical before horizontal |
125 | Windows 2000 and later: Layout progress, vertical bottom to top |
The combinations of different values of these three bits make the description of almost any text directionality outside of Boustrophedon (or Rongo-Rongo) possible:
Bit 123 | Bit 124 | Bit 125 | Text Rendering Direction | IREADINGLAYOUT equivalent |
0 | 0 | 0 | LeftToRight, then TopToBottom | 0 |
1 | 0 | 0 | RightToLeft, then TopToBottom | 1 |
1 | 1 | 0 | TopToBottom, then RightToLeft | 2 |
0 | 1 | 0 | TopToBottom, then LeftToRight | 3 |
0 | 0 | 1 | LeftToRight, then BottomToTop | n/a |
1 | 0 | 1 | RightToLeft, then BottomToTop | n/a |
0 | 1 | 1 | BottomToTop, then LeftToRight | n/a |
1 | 1 | 1 | BottomToTop, then RightToLeft | n/a |
Now as the last column hints at, the four reading layout choices we support are all completely able to be derived from the LOCALESIGNATURE bits.
The additional four rendering options theoretically able to be captured by these bits but not available to the new flag are not used as as primary rendering for any language we support.
A part of me wishes both
were entirely derived from these three bits, since doing that directly only satisfies my inner database developer that hates storing repetitive data in multiple places.
Of course in a mature society, there is room for disagreement, but in this case I'm reasonably certain that those who disagree with me here are wrong. :-)
Since had we done it sooner, it might have prevented us from shipping managed code bugs like the one described in It's not right when IsRightToLeft is wrong, and native code bugs like the one described in Double Secret ANSI, part 2 (the brokenest one yet, sorry 'bout that!).
Because the best way to make sure the data is correct is to use the data.
Early and often, as both those bugs that made it to shipping products prove quite effectively (to our detriment at the time).
To be perfectly honest, I wish we would make this change even now, because we will always consider any differences between these three different items as a bug, as the best way to make sure that they don't fall out of sync is to use one source for all of them.
We could in theory make this change later this week to the data behind the properties.
Now I am an owner of the data, but this would also be code to change (in multiple products across multiple divisions). I can appeal to the owners to fix the long-term problem sync though.
Before that, we we can even fix the problem I mentioned the other day in How to detect if a locale is Bidi, Windows 7/8 edition, where we stop returning results that are incorrect 99% of the time (claiming verticality for CJK and Mongolian), by default...
Technically, I could have used that idea and made this another part of the "The evolving Story of Locale Support" series, but I'm not confident that everyone will agree, so who knws whether we'll evolve that way, yet!
John Cowan on 16 Jan 2012 10:59 AM:
It's interesting that people don't seem to have adopted bottom-to-top writing anywhere. Even Ogham is only a marginal case: it's true that inscriptions are often written up the edge of a stone, but longer ones go up the stone, then left-to-right, then down another edge, much the way Latin script is written on an archway. Manuscript (or printed) Ogham is always left to right, top to bottom.
Van on 16 Jan 2012 5:58 PM:
John, if I remember correctly, Uighur Arabic is sometimes (more often historically) written vertically bottom-to-top, like a standard Arabic text turned 90 degrees clockwise. It was essentially a way of setting it in Chinese style columns while keeping the Arabic-form words looking right. That's the only example I can think of other than the marginal Ogham.
John Cowan on 23 Jan 2012 7:57 AM:
Van: Yes, I've seen examples of that. There is also the fact that titles on the spines of German and French (but not English) books run bottom to top. But these are all marginal cases, rotations of the basic direction to fit vertical columns of space. The basic four directions are:
Left to right text, top to bottom line progression (e.g. Latin)
Right to left text, top to bottom line progression (e.g. Arabic)
Top to bottom text, right to left line progression (e.g. vertical CJK)
Top to bottom text, left to right line progression (e.g. Mongolian)
Michael S. Kaplan on 23 Jan 2012 8:22 AM:
The last two simply aren't the default for computers on any platform, unfortunately. :-(
referenced by