Customizing shaping behvior in Uniscribe

by Michael S. Kaplan, published on 2005/12/19 05:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/12/19/505309.aspx


The Uniscribe API set is very powerful, much more powerful than most people need in terms of all the things you can possibly do.

To give an example, the other day James Brown asked the following in the microsoft.public.win32.programmer.internationla newsgroup:

Hi,

I have the following two Arabic Unicode characters in my "paragraph":

U+0647
U+064a

Drawn together as one "run" with the Uniscribe API (i.e ScriptShape/ScriptTextOut etc) they appear fine (i.e. the same that Notepad/Wordpad shows them)

However I wish to draw the first character in a different colour to the second. I am building my own "style run" list and merging this with the output from ScriptItemize to create two runs which I then pass to ScriptShape/Place/TextOut.

However the two characters are drawn "separately" and don't join together cursively like when they are drawn as "one". I know this must be possible because notepad/wordpad has no problems drawing "joined together" characters side-by-side in different colours....so how to I convince Uniscribe that my two "split runs" really used to be a single run, and to generate the glyphs as if they were still joined together??

James

A very good question. Now of course by default ه (U+0647, a.k.a. ARABIC LETTER HEH) and ي (U+064a, a.k.a ARABIC LETTER YEH) would combine and look like the following (YMMV depending on the browser you use):

هي

But if you try to display the two parts with two different colors than they may or may not shape the same way:

هي

(note that in IE they seem to shape exactly the same way as I see them in Notepad and many other places. Perhaps James is using different fonts or different product versions?

Let's take another string, like the word Arabic translated into Arabic and give it a try:

العربية    -->    العربية

Hmmm.... that works too. I wonder what the HTML looks like? Hmmm:

<P><FONT color=#ffa500>ا</FONT><FONT color=#ff1493>ل</FONT><FONT color=#800080>ع</FONT><FONT color=#0000ff>ر</FONT><FONT color=#800080>ب</FONT><FONT color=#008000>ي</FONT><FONT color=#a52a2a>ة</FONT></p>

Ok, it looks like in a lot of these cases Uniscribe is smart enough to shape independent on those various differences. Makes me wonder what changing the font name would do:

العربية

Ok, that breaks things down in IE. I knew I could break the string if I kept trying! Now here is the HTML:

<P><FONT color=#ffa500>ا</FONT><FONT face="Times New Roman" color=#ff1493>ل</FONT><FONT color=#800080>ع</FONT><FONT face="Times New Roman" color=#0000ff>ر</FONT><FONT color=#800080>ب</FONT><FONT face="Times New Roman" color=#008000>ي</FONT><FONT color=#a52a2a>ة</FONT></P>

But on their own, both fonts can shape with no problems:

العربية   -->   العربية

Now it was a mere 15 minutes later that James posted as possible answer for the problem he was seeing:

ok I have discovered the fLinkAfter and fLinkBefore members of the SCRIPT_ANALYSIS structure. By modifying these two members to create a "link" between the two runs I can achieve the result that I believe is correct.

So my question is, is this the correct method? Whenever I split a SCRIPT_ITEM in half (due to colour/style attribute merging), can I simply modify the fLinkAfter/fLinkBefore to "rejoin" the two halves the split SCRIPT_ITEM, and safely do this for all scripts I might
encounter??

thanks,
James

Indeed, looking at these two members of the SCRIPT_ANALYSIS structure:

they do indeed look like the documented and supported way to make this work when it is not working by default. Now it is not completely clear if it will work every time; in fact I am sure there are some style changes that will either break the shaping behavior no matter how the flags are set or the characters will not completely line up (size changes might cause the latter kind of problem).

These particular settings are not available in GDI+ or in the TextRenderer class (the Uniscribe wrapper in .NET 2.0 I have discussed previously); for the time being this kind of capability is only in Uniscribe, but it is definitely there when it is possible....

 

This post brought to you by "ي" (U+064a, a.k.a ARABIC LETTER YEH)


# Surge on 19 Dec 2005 5:33 AM:

the coloured characters don't join in firefox 1.5 :)

# Michael S. Kaplan on 19 Dec 2005 6:02 AM:

Somehow you knew it would end up that way. :-(

# Michael S. Kaplan on 19 Dec 2005 7:37 AM:

Everything looks as I described on IE6 on XP SP2 and on IE7 on a Vista build from late last week.

I don't think there is any doubt that the IE behavior is better here (and technically it is making better use of the standard since bidi rendering is supposed to be independent of color-type changes, and not affected negatively by such things!)....

# silverpie on 19 Dec 2005 10:51 AM:

In theory, adding ZWJ on each side of the color change should also force the characters to link.

# Mihai on 19 Dec 2005 12:47 PM:

Although it might be easy to blame Firefox on this one, I would not.

This feels a bit like those "more powerful than most people need", so why should it document it properly.
See also "The lack of Uniscribe samples" (http://blogs.msdn.com/michkap/archive/2005/12/06/500485.aspx)

The "Supporting Multilanguage Text Layout and Complex Scripts with Windows 2000" sample has the same problem as Firefox.

I did not check the code in "I18N with VB", but I really think we need some decent samples+documents on how to use Uniscribe.
It looks that people need a bit more than what is available.

# Maurits on 19 Dec 2005 1:01 PM:

I suppose this is where seperation of content and style breaks down...

Suppose instead of <font> tags they were <span> tags:

<span class="first">ل</span><span class="second">ع</span>

And let's suppose there was a style sheet:
span.first { color: red; }
span.second { color: blue; }

Fine, they should link.

But what if you changed the style sheet to:
span.first { font-size: 14pt; }
span.second { font-size: 30pt; }

That would cause the links to not line up.

Or what if you did

span.first { display: none; }
span.second { display: block; }

This would cause the first letter to disappear and the second to have a line break before it...

In principle the decision to break (or not to break) could be made after the styles have been applied, I suppose.

# TheMatt on 19 Dec 2005 1:04 PM:

Huh...my Firefox 1.07 on Fedora shows the exact same for both. I wonder why it isn't joining the letters...

# Matthew W. Jackson on 19 Dec 2005 1:45 PM:

Firefox for Windows uses Uniscribe, but obviously this is platform-specific code. I don't know if anybody has written anything similar for Linux yet, but if such a thing exists Firefox is obviously not using it.

What's really funny is how Firefox on Windows behaves when you highlight one character of the run (it splits into the individual runs, where as Internet Explorer keeps them together).

I imagine this sort of thing is very difficult to get right, so I'm not completely surprised by the behavior of Firefox.

I *am* dissapointed that the Uniscribe wrapper in .NET does not provide more options, since I've had nothing but trouble when trying to manually format rich-text in pure managed code, and things still have not improved much in that regard. Displaying one big string is easy, but anything more than that is a pain.

# James Brown on 19 Dec 2005 4:53 PM:

Hi Michael,
thanks for your efforts on this - its very much appreciated! I'm reposting here what I posted back on usenet as a reply to you:

The example you give here is exactly the application I had in mind (colourising spans of text). I was a little unclear in my original posting when I said:

"notepad/wordpad has no problems drawing "joined together" characters side-by-side in different colours."

What I meant by this was, when you select one of the characters I mentioned in a regular EDIT/RichEdit control, the characters do not change shape even though the one is being drawn in a different colour to represent
the selection. This is the behaviour I am aiming for in my custom editor - I wasn't implying that notepad/ID was doing anything wrong, rather they were doing it right but I couldn't figure out how to emulate their behaviour based on the info in MSDN.

I have found a solution though:

The fLinkAfter/fLinkBefore was a red-herring. They are not designed for the purpose I had in mind. Rather, believe they should be used for simulating a "zero-width-joiner" which may not be present in the original text stream. In fact after some experimentation I discovered that using these attributes inappropriately can result in incorrect shaping behaviour. For example, there could be two arabic characters side-by-side which do not normally cursively join, but when split using this method (and the fLinkAfter/Before attributes applied) they do join together, which is of course incorrect. If I could figure out how to put a small HTML sample here I would :-)

Now here is the basic set of steps for drawing a line of text with Uniscribe (no word-wrapping here for simplicity):

1. ScriptItemize
2. Merge with application-defined "style" runs to produce finer-grained items.
3. ScriptLayout

Then for each item/run:

4. ScriptShape
5. ScriptPlace
6. ScriptTextOut

The MSDN documentation is quite unclear about step#2. It implies that you should merge *any* application-styled runs text with the items produced from ScriptItemize. What the docs really mean by "different styles" is in fact
different *fonts*. Splitting/merging colour-information (i.e. for the purpose of selection highlighting/syntax colouring) at this stage is *wrong*

To apply colouring to the text, the individual items should be *further* split *prior* to calling ScriptTextOut. (i.e. at step #6). So there is an extra "merging/styling" step that is required which is not mentioned in the docs.

At this point we are working at the glyph-level rather than characters. Instead of passing an entire "run" (SCRIPT_ATTRIBUTE + glyph buffer) to ScriptTextOut, this run should be split into the clusters identified in the SCRIPT_VISATTR array - and each *glyph cluster* passed individually to ScriptTextOut. Some optimization could occur here so that glyph clusters could be kept together if they share the same colour.

I have also noticed some interesting behaviour in the Script_xx APIs using WinDbg:

Notepad (or rather the EDIT control) uses the "basic" ScriptString API. This API is really neat because it lets you draw a line/paragraph of text with a range of that text optionally "selected" (i.e. drawn in the system highlight colours). ScriptStringOut achieves this *very* strangely. It draws the whole line of text normally (using the regular Script API). It then draws the exact same text a second time, in the same position, over the original text, but this time in a different colour, with a clipping-rectangle applied so that only the "selected" text gets drawn second time around. This allows a very fine-level of selection (it is working at the pixel
level), and is the reason that the selection-highlight + caret can move into the middle of grapheme clusters. Its a real bodge, and is also the reason why the ScriptString API flickers so horribly. (double-buffering using memory-DCs solves this of course).

Wordpad on the other hand (or the RichEdit it uses) does things differently. It uses the ScriptTextOut API directly along with ScriptItemize etc, and does things as I have just described - applying colour formatting inbetween calling ScriptPlace and ScriptTextOut, by breaking each run of text into the separate glyph-clusters and drawing them manually with multiple calls to ScriptTextOut. I have my Neatpad editor drawing text this way now and it works very well.

Note that the InternetExplorer control does the same thing. So does VS.NET IDE editor.

James

# Dean Harding on 19 Dec 2005 5:37 PM:

Firefox doesn't use Uniscribe on Windows (there's a bugzilla bug that's been open since the dawn of time on it: https://bugzilla.mozilla.org/show_bug.cgi?id=218887)

You can get a patched version of Firefox which uses Uniscribe, but they say it's not going to be integrated with Mozilla, since the Mozilla people are planning to re-architect the rendering engine to use Cairo. In the meantime, the patched versions are here: http://blacksapphire.com/firefox-rtl/

I think it's probably better for the mozilla people to use Cairo/Pango anyway (rather than Uniscribe), since that would increase the user-base of people using those libraries which would in turn increase demand for support for other languages from those libraries, which can only be a good thing. I mean, Uniscribe is great and all, but there's nothing like a little healthy competition, eh? :)

# Nick Lamb on 19 Dec 2005 8:23 PM:

Right, the Gecko engine (Firefox, Mozilla etc.) currently does a lot of work that it is poorly equipped for, rather than leaving it to the experts who write OS text rendering code. This is an example where that breaks down visibly. Let's hope Dean Harding is right that Cairo integration improves this aspect of Gecko.

Παν語 (Pango) can get this right, although I couldn't think of any applications I had which would demonstrate this, so I had to spend a few minutes throwing something together and testing it. I used a basic GTK+ label widget, and set the markup inline, then tried some variations on that theme to check that various sensible approaches work. I tested on a rather battered old Fedora Core 4 machine. Neither Pango markup strings nor externally applied markup which change the color of one or more characters affect the shaper. You get a nice smooth Arabic word, in a rainbow of colors in each case.

[Disclaimer, I cannot actually read Arabic, but it looks right in so far as I'm qualified to judge]

OpenOffice.org gets this right on FC4 too, I don't know if it uses Pango or whether the support is independent. But I was able to paste today's blog examples into OO.o 2.0 on the same FC4 system and see them rendered as expected rather than as in Mozilla/ Firefox.

So, James Brown's desired effect is achieved easily in Pango/Linux. I cannot confirm whether this works with Pango on Win32, but it should, especially since Uniscribe can be selected as a backend for Pango/Win32.

go to newer or older post, or back to index or month or day