And how exactly do you justify those frigging kashidas?

by Michael S. Kaplan, published on 2010/08/31 07:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2010/08/31/10056140.aspx


Over in the Suggestion Box, DJPN asked:

Hi,

could you describe the correct way, or maybe some ways ,of doing justification of text containing arabic characters.  For instance, if I use ScriptJustify (using my home grown uniscribe-using library) and pass iMinKashida>0 then it occasionally goes wrong when there are characters that have combining diacritic marks.  I note that MS Word doesn't justify using kashidas.

Now in the past I have discussed kashidas before, in blogs like On character justification (in *both* senses) and You've got to be kashidding me and generally speaking unless you are doing all the work yourself, the ScriptJustify Functionis the way to "Kashidize" your text.

However, the iMinKashida, documented only as:

Minimum width of a kashida glyph to generate.

is really just part of the story.

In fact if you look at the remarks, it notes that:

This function provides a simple implementation of multilingual justification. It establishes the amount of adjustment to make at each glyph position on the line. It interprets the SCRIPT_VISATTR array generated by a call to ScriptShape, giving top priority to kashida. The function uses interword spacing if no kashida points are available. It uses intercharacter spacing if no interword points are available.

Note   Sophisticated text formatters might generate their own delta dx array by combining formatter-specific features with the information retrieved by ScriptShape in the SCRIPT_VISATTR array.

The application should pass the justified advance widths generated by ScriptJustify to ScriptTextOut in the piJustify parameter.

ScriptJustify creates a justified array containing updated advance widths for each glyph. When an advance width for a glyph is increased, the extra width is rendered to the right of the glyph, with a white space or, for Arabic text, a kashida.

Note   Kashida insertion occurs to the right of the glyph to justify visually. Microsoft Word and Microsoft PowerPoint use this concept. Any change in the kashida placement algorithm should accompany a change in the corresponding ScriptTextOut handler for a particular script, for example, the Arabic TextOut justification handler.

Because is you modify the SCRIPT_VISATTR struct that you are going to pass in here, you will note that you can modify whether you want and/or expect to see kashidaas placed here quite easily by deciding whether psva->uJustification contains SCRIPT_JUSTIFY_ARABIC_KASHIDA or not or the other various interesting potential flags in the SCRIPT_JUSTIFY Enumeration. Note for the particular case of diacritics, you know if there are diacritics present via psva->fDiacritic. What you do here can get pretty intricate!

Now I admit this isn't very well documented; about all I could find online is docs like Uniscribe: The Missing Documentation & Examples, which is a decent brain dump of some of the work in Google Chrome to support Uniscribe that kind of ignores this kind of issue in mixed text with specific suggestions that indicate not very much customization is happening there.

Doing this one yourself can be complicated. Though if you want to try it?

Going in with clear goals and good sample text to try different cutomizations out with could allow you to do some fairly impressive work that the default results might not do very well with....

Now once again this would be a great place for a Uniscribe sample. Though this one can be quite complicated to accomplish; if anyone makes the attempt and wants to discuss what they are finding, let me know and I'll writ it up in the future.


John Cowan on 31 Aug 2010 7:16 AM:

Nothing like making a typo in the title.

Michael S. Kaplan on 31 Aug 2010 8:55 AM:

There is having the nerve to fix it after several thousand views. :-)

What kind of stupid spell checker doesn't check the title?!?

Brett Wilson on 31 Aug 2010 7:22 PM:

Thanks for noticing my Chrome Uniscribe page! I enjoy reading your blog.

When implementing Uniscribe support in Chrome, I fount it to be somewhat of an odd match. It seems designed to support rather complicated applications using only Uniscribe for text layout. But applications with very advanced text layout needs are typically designed to handle a lot of these cases themselves. And any promise it had to support really interesting and complex cases is mostly killed by the general lack of documentation & examples (as everybody notes).

Your comment about not being super advanced in terms of combining text is correct here. WebKit is handling a lot of this at a higher level. Justified text, for example, is done mostly at a higher level and in the "complex script" layer the job is mostly to just use up the space allocated to you. There are likely to be some sub-optimal cases of justification, but it seems to work quite well in general. I think the fanciest Uniscribe we did was adjusting the letter-spacing as controlled by the CSS letter-spacing attribute. At least when I wrote it, we were the only browser that did this correctly for things like Hebrew with vowel points. Of course, nobody does this because it looks stupid, so it ends up being a hollow victory.

I wonder if there are *any* applications that really use this stuff to its full potential? I (and I suspect many other programmers) would prefer something that makes it really easy to just get all the information out of the system, and maybe did a bit less mysterious stuff under the covers "for us." Even though most of the information is available, it seems designed for "living in Uniscibe" rather than dropping into a advanced text layout engine like a browser. And something advanced like a browser seems like it would be the only use cases for some of the more advanced stuff.

Extreme example of this: Safari on Windows resorts to rendering Uniscibe to a WMF, then scanning the text version of that to see which fallback fonts were use for rendering certain text, because Uniscibe is helpfully "handling it for you." Chrome's fallback code is somewhat different and doesn't do this.

PS: One thing I'm still confused about is what you're supposed to do with pABC from ScriptPlace (I talk about this on my page). Any pointers?


go to newer or older post, or back to index or month or day