Short-sighted text processing #2: Getting hurt while playing on the bleeding edge

by Michael S. Kaplan, published on 2010/12/20 07:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2010/12/20/10107008.aspx


Previous blogs in this series:

Now I don't want to just totally knock Uniscribe.

Or any of the successor text stacks that were largely designed and implemented by the same people (when I say "Uniscribe" you can assume I'm talking about all of them unless I say otherwise).

Some very powerful things happen in regard to text rendering of languages and scrpts throughout the world.

But there is a particular flaw of Uniscribe that I am going to talk about here today.

As I pointed out in part 1, you have to promise not to tell anyone you heard it from me.

Even if you already knew about it.

The problem is that it never goes the whole distance when it adds support for a script.

Well, that isn't really the problem.

The problem is that for anything even remotely complex, the applicable shaping engines of Uniscribe tends to build a little fence around the distance they do go, and although they did not go so far as to put razor blades on the edges, the people who need support "beyond the fence" would readily agree with the metaphor as they probably have the cuts to prove it.

Take the Thai issue I talked about in part 1 and remove the lame SCRIPT_PROPERTIES->fRejectInvalid stuff, since it isn't really there for the most part anyway.

I admit that Word's "sequence checking" as I blogged about in You're not the one out of sequence, and that's the Word is related to that, but since you can turn that feature off I am prepared to forgive them for their sins in this area.

The other part (the SCRIPT_LOGATTR->fInvalid part) is still there, and it leads to a lot of those dotted circles.

When Matin Hosken was talking about Sequence Checking in Thai & Lao a few years back, he was pointing out that while there may be things that are illegal in Thai and Lao (the modern languages in use) that are perfectly acceptable in every other possible use of the Thai and Lao scripts. Including a not even close to insignificant number of minority languages within Thailand and Laos and Southeast Asia.

These uses will be rife with small instances of dotted circles.

These dotted circles may be true by the limited vision that Uniscribe brings to the task of determining what is or isn't illegal. And it is generally more impressive than how your UK English essay will look after you throw the US English spellchecker at it. But that is mainly because the spellchecker marks in red while Uniscribe does not.

One can essentially count on a terrible experience for many minority languages due to these overcorrecting "script mavens" that are the Uniscribe shaping engines. We can practically call them language specific given their tendency to treat as wrong text that has the nerve to fall outside of the small set of languages they are aimed at and whose principal benefit is that they have the most speakers and/or the most paying customers and/or the most government support.

And the South Asia connection is just due to the huge number of languages that Uniscribe does support now (within which sit the even huger number of minority languages that they do not support).

This applies to pretty much the entire world -- and it can even lead to people like Andrew West, who in some cases would be huge fans of all that Uniscribe can do, to write blogs like Prototyping Tangut IMEs, or Why Windows 7 Sucks. Because the situation is common: before support is there things work fine, but then once some support is added a whole bunch of stuff stops working.

At that point, the fences are up. And minority languages and other cases not explicitly handled

Now what is really needed is for the Uniscribe shaping engines to be a little bit less prescriptive, a little more willing to grant that the people authoring fonts may know exactly what the hell they are doing, especially in these edge and not-so-edge cases where these shaping engines don't have a clue. And couldn't find one even if Steve and that dog Blue were there to dig the clues up....


Andrew West on 22 Dec 2010 5:12 AM:

"Now what is really needed is for the Uniscribe shaping engines to be a little bit less prescriptive, a little more willing to grant that the people authoring fonts may know exactly what the hell they are doing, especially in these edge and not-so-edge cases where these shaping engines don't have a clue."

That's what I've been arguing for years.  It's great to read statements like that from you, but it seems to me that Uniscribe is getting more and more prescriptive as it expands the list of languages it supports, so I hope people from the Uniscribe team will read and reflect on this post.

Michael S. Kaplan on 22 Dec 2010 9:54 AM:

I don't know that it is specifically getting more prescriptive, but there are several forces at work. I'll talk about this in a future part of this series. Stay tuned! :-)


referenced by

2011/01/06 Short-sighted text processing #6: OpenType and Apple and OpenType

2011/01/05 Short-sighted text processing #5: PU[A]! That pad THAI is pretty spicy....

2011/01/04 Short-sighted text processing #4: Squeezing every bit of text you possibly can out of MacOffice 2011

2010/12/30 Short-sighted text processing #3: The Protcols of the EDIT for i18n

go to newer or older post, or back to index or month or day