by Michael S. Kaplan, published on 2009/06/08 10:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2009/06/08/9707348.aspx
The other day, Wayne Shu asked:
Hello Michael:
I have visited your blog, and know that you are an expert in Windows Uniscribe, here I have some questions about Uniscribe to ask you.
now I am using Uniscribe to do a program to generate glyphs from a arbitrary unicode string.
I use ScriptItemize, ScriptShape, ScriptPlace functions, but it does not work correctly for some font,
for example the font "Arial Unicode MS", the unicode string is "ཀྲུང་ཧྭ་མི་དམངས་སྤྱི་མཐུན་རྒྱལ་ཁབ།" (Tibetan version of "People's Republic of China")
ScriptShape always return USP_E_SCRIPT_NOT_IN_FONT for this string,
but I have checked "Arial Unicode MS" using Character Map, "Arial Unicode MS" do support Tibetan characters.
why ScriptShape return USP_E_SCRIPT_NOT_IN_FONT here?
another question about function ExtTextOut.
for the same font "Arial Unicode MS", and string "ཀྲུང་ཧྭ་མི་དམངས་སྤྱི་མཐུན་རྒྱལ་ཁབ།", I have tried to use ExtTextOut to display it directly, ExtTextOut can display it correctly.
even for some characters that "Arial Unicode MS" does not support, for example sinhala characters, ExtTextOut still can display these character correctly,
why? as I know ExtTextOut has indirectly invoke Uniscribe for displaying texts. but they behave so differently, does ExtTextOut do some font back internally?
Thanks and forgive my poor english.
--
Best Regards.
Wayne Shu
Now when you stack his English up against my typos, I think he does pretty well, but that's just my opinion. :-)
Anyway, it would be easy to dismiss the whole question with a quick reference to a previous blog of mine like Arial Unicode MS effectively [bites|sucks|blows].
But in the case of Tibetan is by its nature more complicated, taking the relatively minor issue of a few characters in Bengali and taking it to an extreme.
It is best to think of what Uniscribe does as a careful dance between the data inside the font and its own knowledge of Unicode generally and certain scripts within it specifically. This is not always true but generally is often quite true for anything requiring complex script processing (how true will vary with the script and is described within the documentation provided by Microsoft for that script).
In the case of Tibetan, the need for a font with the correct supporting data is crucial.
To some this may seem unfair -- this is not just different parts of Microsoft talking to itself; this the same parts of Microsoft (the Typography team) talking to itself! But in truth this is not the case, since (and this is a gross over-simplification that I might get into more specifically another day) a lot of the font "data" is the look of specific combinations that really amounts to the equivalent of substituting two or more separate Unicode code points with a particular grapheme that shapes better with surrounding text and often looks little like the original code points would by themselves. How could Uniscribe alone contain such data without knowledge of what is within the font? The split between shaping engine and font in such cases does make a lot of sense.
And in this particular case (Arial Unicode MS and Tibetan) the answer is easy: the data does not exist in the font at all! It has the individual graphemes but a lot opf the rest of the data simply ain't there.
Thus if you pick a font like Arial Unicode MS to do such work, you are never going to get the best result....
For the other question, when one calls ExtTextOutW one gets some of the higher level Uniscribe functionality like font substitution on a per script basis (something the lower level functiins will never do), and as a benefit they will generally pick better fonts than Arial Unicode MS, so one will tend to see better results when the support is there.
This post brought to you by ུ (U+0f74, aka TIBETAN VOWEL SIGN U)
Wayne Shu on 9 Jun 2009 10:29 AM:
Thank you, Michael, you have made a wonderful explanation about my questions. thanks!