by Michael S. Kaplan, published on 2007/07/07 03:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2007/07/07/3738133.aspx
Kristen brought up a very good question yesterday that one of her testers asked her:
This happened on WinXP and Vista
Set locale to Chinese(Tawain)
On XP - install the needed CHT pkg for surrogate support
- create a new Groove notepad record
- enter Chinese characters and then surrogate pairs
D841DD8C or D842DF63 on winXP
- save the record
- print the record
Actual results: surrogate pair records are 'fainter' and thinner than the regular Chinese text
Expected result: would assume the characters should print the same.
Actually, there is a bit of a misunderstanding about what is expected here....
To describe what I mean, I will start with the CJK ideographs in the Basic Multilingual Plane (BMP)
You may remember the GDI font link chain I described back in Font substitution and linking #3. I did not give the whole list, but I showed the order of the CJK fonts themselves, and how it varied with different default system locale settings:
Japanese will default to using MS UI Gothic (fallback to PMingLIU, then SimSun, then Gulim) Korean will default to using Gulim (fallback to PMingLiu, then MS UI Gothic, then SimSun) Simplified Chinese will default to using SimSun (fallback to PMingLiu, then MS UI Gothic, then Batang) Traditional Chinese will default to using PMingLiu (fallback to SimSun, then MS Mincho, then Batang)
Now let's take the situation where the default system locale is Japanese, and the first two fonts are MS UI Gothic and PMingLiU. Looking at the two fonts in Character Map shows that the first font clearly has a different set of ideographs in that first block, if for no other reason that the two sets of ideographs in the visible block are not identical and end on two different ideographs:
Now looking at the two fonts, they clearly show different styles for the ideographs, which means that any time you have a Japanese system locale, use MS UI Gothic, and then have text that is not contained in that font that the text will look like it has different styles.
And this is a problem that also exists in differences between the different fonts that contain CJK ideographs from Extension B as opposed to the BMP, where differences in styles between fonts can exist just as easily. And you can end up with those different styles in mixed text.
Note that in the Extension B case, it is not GDI font linking that does the work; it is Uniscribe font fallback. But the end result is the same any time the two fonts do not have identical styles....
Now note that this will not always be the case, and sometimes the fonts may be intentionally designed to have like styles. But when they are not, it isn't always going to be unexpected! :-)
This post brought to you by 𠖌 (U+2058c, the CJK Extension B ideograph represented by U+d841 U+dd8c in UTF-16)
# Kemp on 7 Jul 2007 8:16 AM:
Would this not still be considered a bug and unexpected from the viewpoint of a user? Something the system is doing that they don't necessarily know about or care about and certainly can't fix themselves (font fallback) is affecting something they very much do care about (the rendering of text).
# Kemp on 7 Jul 2007 8:47 AM:
Sorry for double posting, I've just realised that my comment could be misinterpreted. I don't mean that font fallback itself should be considered a bug, I'm referring to the overall behaviour, i.e. different styles of character being shown where they wouldn't be expected to by someone who didn't know all this about fonts, such as your average end user.
# Michael S. Kaplan on 7 Jul 2007 10:05 AM:
That is true, Kemp -- though in most situations wouldn't the user know there is something special about these characters?
Rather than considering this is as bug, it is more of a limitation of the feature of both font linking and font fallback, due to the problems in trying to mke two different fonts work together.
As grating a it may be to some, I am sure it is better than a .notdef glyph, at least. :-)
# Kemp on 7 Jul 2007 2:44 PM:
Indeed, a character of the right type showing (even if of a different style) will always be better than a blank space or the classic box :-)
Just to be clear about this, in what sort of situation would this occur "in the wild"? Obviously you can engineer cases that force the fallback as given in the original question, but would this always only occur in situations where someone should be prepared for it, or is it something that could strike at any time, such as using a word processor? Not using those languages myself I'm not completely clear on how commonly the "fallback" characters would have to be used.
# Michael S. Kaplan on 7 Jul 2007 4:13 PM:
It would not be expected to be common -- you have to combine text that is typical for the language with text that usually would not be....
# Kemp on 7 Jul 2007 6:31 PM:
Ah, in that case it'll almost always be caused by people who should be reading the documentation I guess.
I don't think I have much more here, so thanks for another informative post. I don't work with localisation and suchlike myself (thus the slight cluelessness), but I do find the topics you cover rather interesting and I'm gradually learning what is good practice and what isn't in preperation for the day I'm thrown in the deep end ;-)
# Michael S. Kaplan on 7 Jul 2007 7:23 PM:
Something many people tell me, I definitely take it as a compliment!
I should also mention that there is also the case of browsing web sites or reading documents that are in another language, of course. :-)
Ideally every CJK font would cover (via a collection or whatever) all 70,000+ ideographs, though in practice that is usully not practical. So we make these small sacrifices for the greater good....
# Nick Lamb on 8 Jul 2007 5:12 PM:
"As grating a it may be to some, I am sure it is better than a .notdef glyph, at least. :-)"
I'm glad to see that you've changed your mind about that Michael. There may be hope for you yet.
# Michael S. Kaplan on 8 Jul 2007 5:25 PM:
I never thought the .notdef was a good solution -- I just don't like "solutions" to that problem like megafonts. :-)
# Mihai on 9 Jul 2007 2:28 PM:
I would consider it a bug for another reason: Character Map is not consistent.
Select "Angsana New" (or Arial, or "Lucida Sans" or whatever) and using "Character Set: Unicode" you will notice that only the glyphs that exist in the font are shown.
So I guess the expectancy is that Character Map does not do any font fallback/linking/substitution.
# Mihai on 9 Jul 2007 2:29 PM:
expectancy => expectation :-)
2007/07/09 Character Map Plus?
go to newer or older post, or back to index or month or day