The PUA isn't complex enough

by Michael S. Kaplan, published on 2006/07/10 10:52 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/07/10/661427.aspx


Bob Richmond asks in the Suggestion Box:

UniScribe, Opentype, Unicode versions, and the PUA

As I understand it, Uniscribe divides text into runs and applies OpenType features on a script specific basis. If the script is unknown (for instance when characters belong to a new or upcoming version of Unicode not yet coded in Uniscribe, or belongs to a PUA), the features may not be applied. For example, if a font defines a ligature on E000 and E001 in BMP/PUA, this is ignored in Word 2003 etc. etc. By contrast, Adobe CS2 applies the ligature anyway, presumably on the principle that this is the intended behaviour of the font.

The naturally makes me ask the questions: Why doesn't UniScribe just apply the features on an unknown string? How can the user of an application such as Word make sense of this behavior? If I want to write Coptic (not in the Word language list yet), what behaviour should I expect? Are Vista/Office 2007 any different in these areas?

As far as I can tell, not a subject documented very well online so an interesting topic to address.

For me this is not just a curiousity. I'm working on implementation issues for Ancient Egyptian (c.f. UTC Nov 2005) so have 1000+ characters whose OpenType features are ignored by most non-Adobe apps in Windows!

Bob is making some very good points here. I mean, it is clear if you look at this post and this one that there is a mild anti-PUA bias, even ignoring the issues that Bon raises.

But in the case of Uniscribe the issue is not quite as intentional as in the NLS case, and I am perhaps naive to believe that intent should actually be given some weight in deciding how bad someone or something is. :-)

I'll start by quoting a bit from Robert A. Heinlein's Stranger in a Strange Land, to act as something of a hint as to the issue here:

IN THE VOLANT LAND OF LAPUTA, according to the journal of Lemuel Gulliver recounting his Travels into Several Remote Nations of the World, no person of importance ever listened or spoke without the help of a servant, known as a "climenole" in Laputian – or "flapper" in rough English translation, as such a Servant's only duty was to flap the mouth and ears of his master with a dried bladder whenever, in the opinion of the servant, it was desirable for his master to speak or listen.
   Without the consent of his flapper it was impossible to gain the attention of any Laputian of the master class.

For some more hints on what is going on, you can look at Did he say shaping? It's not in the script!.

The central issue in that post is the fact that in some code paths the Latin script is being treated as complex, and in some others it is not.

The central issue in the case of the Private Use Area, on the other hand, is that it is not being treated as complex in any code path. The Uniscribe's "flapper" is not considering it text that Uniscribe needs to see.

In the end, the result of text being considered a "comprex script" on Windows is that Uniscribe will be used to render it, even when Uniscribe is not called directly.

Which means that if you have a program that calls Uniscribe directly, you can get what you want here (in fact, if you use the new OpenType-feature friendly functions, you can get the various optional OpenType features that Bob was thinkiing about supported!).

But as it is currently conceived, lots of "non-complex" text is not going to be going through Uniscribe....

This is a complex issue to address due to the complex nature of the relationship between GDI and Uniscribe:

1) Through LPK.DLL and similar Uniscribe 'flappers' (method described in part here), the decision that the flapper makes about calling Uniscribe when someone calls GDI is made (thus sometimes GDI calls Uniscribe).

2) Uniscribe itself farms out the actual rendering job of the text it has processed back to GDI (thus sometimes Uniscribe calls GDI).

But the flapper is really there for a reason -- because even if a decision were made to treat all rendering as complex, there would be some pretty global performance and backcompat issues.

Which is not say the problem is unsolvable; I'm just saying it is not trivial. And the decision to make the Uniscribe flappers a bit more open is one that is often debated, rather heavily.

In Vista, many more scripts are considered "complex" so there is effort to improve this situation. And it can only keep getting better over time....

 

This post brought to you by (U+e234, a.k.a. a Private Use Character)


no comments

go to newer or older post, or back to index or month or day