by Michael S. Kaplan, published on 2007/03/18 17:59 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2007/03/18/1909475.aspx
Earlier today (in There was an order for letters, iroha was it's name-oh!) I talked about a specific pangram that has an interesting educational functionality, even in modern times.
Maybe for computers too, and maybe not -- the evidence isn't in just yet. :-)
But there are cases where as good pangram is exactly what you might want on a computer when you think about the core purpose of a pangram -- to show each and every letter in the alphabet in as short of a string as possible.
How about in the sample text used by the Windows font viewer?
If you look at the fonts folder in Windows XP or Server 2003 and just double click on any font you will get a nice dialog with a nice pangram in it:
while other fonts would show a different pangram:
As far as I know, no one has ever described the rules by which the Fontview.EXE decides what string to display. Mainly, people seem to rely on knowing that certain fonts show certain strings, and they leave it at that. It is not a particularly interesting algorithm, basically going something like this:
if the Thread Locale is CJK:
if the LPLOGFONT->lfCharset is SYMBOL_CHARSET, ANSI_CHARSET, DEFAULT_CHARSET, or OEM_CHARSET:
The quick brown fox jumps over the lazy dog. 1234567890
else
Jackdaws love my big sphinx of quartz. 1234567890
else
if the font claims to support the CP_ACP of the system:
The quick brown fox jumps over the lazy dog. 1234567890
else
Jackdaws love my big sphinx of quartz. 1234567890
Not an especially brilliant algorithm, and both strings are actually in Message Compiler resources which means they could actually be localized (though note that the above algorithm means that localization might make the situation worse here, not better. On top of that, what do you do when you have a string with no latins in it? By this algorithm, they will just get another Latin script string which will still have to use font linking to find the glyphs to display.
Plus, if you are trying to understand at a glance what each font is for, how does this honestly help?
I am sure that the algorithm could be lamer than this, but offhand I can't think of how. :-(
In Vista, the Jackdaw was put on the endangered species list and that string is no longer available. The string is still localizable, so maybe it is actually being modified in different language versions, though this is of limited use and still kind of lame since the string should be FONT driven, not UI language driven.
So now let's think about how we could do a better job here!
How would you proceed with the task of deciding the best possible way to produce the optimal default string that is most likely to display text that not only shows off the font's best characteristics but does so in the language that the user is most likely to be able to understand (if there is one, of course).
That just screams out for an interview question, any time the candidate claimed knowledge of Win32 Text/GDI knowledge!
Anyone want to take a stab here at the algorithm they would try and use? I'll post my thoughts tomorrow.....
This post brought to you by ᘺ (U+163a, a.k.a. CANADIAN SYLLABICS CARRIER TLU)
# Dean Harding on 18 Mar 2007 8:01 PM:
That Jackdaws one always sounded a bit... wrong... to me ;)
As far as an algorithm goes, I guess the simplest method would be to start with a list of pangrams in various languages (so you have an English one, a Hebrew one, a Russian one, etc) then for the font you're trying to display, work out the subset of those pangrams that can be completely displayed by the font (without substitution). From that subset, choose one that matches the local system's language (if available) otherwise, it doesn't really matter which one you choose...
# jon on 18 Mar 2007 8:17 PM:
Maybe I'm missing something, but does it really matter? What makes the Jackdaws one inherently better than the Fox one in those cases? Either string will give you an idea of what the font looks like which surely is the point of the quick preview.
# Ben Cooke on 18 Mar 2007 8:53 PM:
Coincidentally, I was wondering this — that is, why some fonts talk about the fox while others talk about my sphinx of quartz — this last Friday, while I was browsing through a library of fonts my company's designer had licenced.
# Michael S. Kaplan on 18 Mar 2007 10:33 PM:
Hi Jon -- the problem here is that showing a sentence that has nothing to do with what the font provides is not very useful, no matter what string is used!
Both strings fail the usefulness test for most of the fonts provided by Windows. :-)
# gabr on 19 Mar 2007 3:35 AM:
In Slovenian Windows XP, the 'quick brown fox' string is localised into
"V kožuščku hudobnega fanta stopiclja mizar in kliče 0619872345"
(a bit of trivia - at the time Windows XP was localised area code for Ljubljana, Slovenia's capital, was 061; however phone numbers were only 6 places long then so the number at the end was not a real phone number)
National characters in the text above will probably finish all messed up so here's a screenshot: http://www.gabrijelcic.org/pisave.png.
# Michael S. Kaplan on 19 Mar 2007 3:55 AM:
No, they seem to have showed up well. :-)
That pangram article appears to have two Slovenian pangrams:
And the Slovenian article has a variation on the one they put in the box....
# Carl on 19 Mar 2007 7:21 PM:
You might be interested to know what your Distinguished Competition do. FontBook.app (introduced in 10.3 if memory serves) displays different things depending on the language of the font. Japanese fonts get a series beginning, "あのイーハトーヴゥの," with the alphabet and numerals at the end. Chinese fonts begin with a simplified character version of "漢体." Korean fonts have some hangul, some hanja, the alphabet. Arabic fonts have arabic letters. English fonts have the alphabet upper case then lower case plus the numbers. I don't know which Indian language "Gurjarati MT" is for, but it shows those letters and numerals plus the western arabic numerals.
There's also an option of typing yourself to see how it will look in the specified font or seeing the entire repertoire of the font (which for Asian fonts can take a while since they have so many characters…).
# Markus Freericks on 20 Mar 2007 2:35 PM:
I don't get it. Why are there two different pangrams for the same language (english)? the character reportoire of the two seems to be the same, and having two different pangrams for the same language seems user-unfriendly, since a common use case of the font view is the comparison of two fonts -- which obviously works best when the sample texts are the same.
Or am i missing something glaringly obvious?
Skopsko on 7 Apr 2009 4:39 PM:
Is it possible to have custom Unicode multi-language sentence in he FontViewer. Where does is read the localized strings from?
referenced by