When the lines get too complex, the characters may go off script

by Michael S. Kaplan, published on 2008/07/26 03:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2008/07/26/8773862.aspx

Regular readers might remember my blog entitled Did he say shaping? It's not in the script!, where I showed some of the consequence of the Latin script being in that strange place of sometimes being considered complex and other times not, and some of the consequences of that when a font such as Segoe Script is used.

You know, a font where the complex shaping can be rather all-encompassing and can affect every letter.

It might be fun to put together a tool to assess the CPC (complexity per character) of a font, and use it to compare various fonts. Just a random thought....

Anyway, the other day I had someone send me mail via the Contact link about the Segoe Script font:

I was wondering if you had ever noticed the OpenType contextual scripting error in Segoe Script?

If you type in [lowercase] č as an initial or medial character, it shows up as ć, which causes issues for words that start with č. And not to be out done, ç is rendered as č. I was wondering about your opinion on such a bug, that can cause an interesting headache for a good portion of Europeans, and N&S Americans?

Interesting, very interesting.

(I say someone since he or she did not leave a name!)

Now of course I never throw away sample applications, so running the string through the Edit Control Sample from Did he say shaping? It's not in the script! shows that this nameless person was quite right.

We'll take the following string:

č ç čç çč

basically various combinations of U+010d (LATIN SMALL LETTER C WITH CARON) and U+00e7 (LATIN SMALL LETTER C WITH CEDILLA) to see when the appearances change as suggested:

č ---> ć (U+0107, a.k.a. LATIN SMALL LETTER C WITH ACUTE)

ç ---> č (U+010d, a.k.a LATIN SMALL LETTER C WITH CARON)

and run it through our application that shows both the complex and non-complex views of the situation:

Yep, I'd say that the reported bug indeed can be seen in all of those cases that Uniscribe does its "complexification" thing....

I'll forward this on to the folks who manage this font so they do whatever they need to do.

This can indeed be difficult for situations when you need those characters in your language, though of course one only needs to take the lesson from the average person's handwriting to realize that if the worst problem you have is minor things like this, you probably don't have too much to worry about. So this is kind of a big deal that isn't too big of a deal.

I certainly would not start planning on how to try to force the "non-complex" path just to be sure you get the right characters or anything like that. :-)


If you don't know what four Unicode characters would sponsor this blog, you might might have been paying too much attention!

no comments

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

go to newer or older post, or back to index or month or day