You got your latins in my hebrew! No, you got your hebrew in my latins!

by Michael S. Kaplan, published on 2006/10/11 14:31 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/10/11/817290.aspx


Oren Novotny asks:

When will Segoe UI have support for Hebrew characters? This is important because Outlook 2007 uses Segoe UI as its main font. The problem is that emails with Hebrew in the subjects and sender's name come up with ??? instead of the character -- because the glyphs are missing.

I don't want MUI for OL; I want it in English. I just want to be able to read the subject of Hebrew emails...

--Oren

Well, there are actually several things going on here, and I found out several other things that I found interesting when I asked one of my colleagues (Yaron Hezroni, who is actually Soren's boss!) whose Hebew was better than mine (which is not hard, given how rusty my Hebrew is!). So this post is going to dig in and talk about a whole bunch of things....

First there is those question marks in the subject line and in the from line. These are actually not due to the font, they are due to the store contaning the mail being in the older (pre-Unicode) format. Outlook updated the .PST format in Outlook 2003 to be able to fully support Unicode, but it requires creating a new .PST file, etc. If you are not using the new format, you'll get question marks no matter what the font is if the characters are off of the system default code page.

The next thing that was interesting is the idea of adding Hebrew to Segoe UI. I talked to some of my typography cohorts down the hall and they have verified my suspicion that there are no current plans to do this. Instead, if Segoe UI is specified then the built-in Uniscribe font fallback (discussed previously) will be used here to get the glyphs from another font as it is doing the rest of its complex script processing. It just so happens that the font used for the fallback is Microsoft Sans Serif, rather than Tahoma, for reasons about which I am not entirely clear. There are actually many people (including Yaron) who prefer Tahoma here, though Microsoft Sans Serif is not too bad of a choice there for most people, so it does not make a bad font fallback choice.

The real question that comes up when one builds a font like Tahoma that actually combines so many scripts (Latin, Greek, Cyrillic, Arabic, Hebrew, and Thai) is how all of the scriprs work together -- how multilingual text that crosses these script boundaries will look in terms of size, baseline, visual chracteristics, and so forth. Some hard compromomises have to be made (I'll talk about some of these issues for Arabic and Thai another day, and concentrate on Hebrew today!). Hebrew does fairly well in both Tahoma and Microsoft Sans Serif here, so there is not too much of a problem.

Of course the main reason someone like Oren might desire a Segoe UI that included Hebrew was to make sure that such a connection could be made; Segoe UI is neither Tahoma nor Microsoft Sans Serif. Because of that, Hebrew script glyphs that are designed for it are not an unreasonable ask here. And the answer is that Vista actually has a font that has (for the most part) the basic Latins from Segoe UI and its own Hebrew glyphs -- that font is called Gisha.

Now Gisha is an interesting word in Hebrew, it basically represents what an English speaker might think of as the noun form of the word approach, like the way one might approach a person, or maybe the way a typographer might approach the issue of how to apply the font style of Segoe UI with Hebrew glyphs. :-)

It is also an ironic name since as it turned out the folks in the Microsoft subsidiary in Israel and lots of native speakers of Hebrew did not really care for the approach taken with Gisha.The descriptions of why this was the case were not always entirely coherent but usually people either did not see the need for a change, or pointed to specific issues like how it seemed a bit more stark/barren, or how it was more "boxy" (like a Volvo of modern Hebrew script fonts?) or the way the size/baseline issues were handled, or how the ס (Samech) and the ם (Mem Sofit) were not as properly differentiated due to the lack of some of the curves seen in other fonts, and so on. So while it is still readable, it was just not the ideal approach to the generic UI or document font to use as a default for all Hebrew text when a specific font was not explicitly chosen. It's not that it was terrible, but it just wasn't the first (or second, or tenth) choice that people had.

(the approach to transliteration in Unicode names that caused the Samech to be called Samekh probably deserves its own post!)

I'll pause to show a nice picture of a whole bunch of fonts looking at the basic the Hebrew consonants. This includes all of the fonts from Vista, a few from Office 12, and even a few others from the Hebrew version of Office 11 so I could show some Hebrew script fonts.

I put the size I had to use to try to get the consonants to be about the same size (to make the comparisons easier in terms of style). Of course this hides the size issue but I figure if someone loves a font they can always jack up the size themselves, right? :-)

If you are a native Hebrew reader/writer/speaker, you will likely have your own preferences -- for what you would want in a Torah, or an email, or a Word document, or an IM window, or anywhere really:

 

Interestingly, from a typography standpoint many folks were pleased with Gisha and praised its clean lines and the work to try to push the Segoe UI approach to the Hebrew script. Though in another perhaps ironic turn, there is no "i" as in "with" sound in Hebrew, meaning that one would have to pronounce it as in "ee" like in "see". The Hebrew is גישה which is also the Hebrew word used for the term Geisha. Now when one talks about clean lines and such in that context, a whole new and possibly HR-unfriendly aproach to typography is seen!

But how to add multiple scripts into the mix remains a difficult problem (many of the above fonts do not have the Latins in them, and I did not show the ones that did in the document above), even for scripts for which the problem seems like it ought to be easier....

 

This post brought to you by ס (U+05e1, a.k.a. HEBREW LETTER SAMEKH, s.k.a. samech)


# Dean Harding on 11 Oct 2006 7:36 PM:

> First there is those question marks in the subject line and in the from line. These

> are actually not due to the font, they are due to the store contaning the mail being

> in the older (pre-Unicode) format.

It took me a while to figure that one out as well. It's not like I can speak anything other than English, but I've had a few email exchanges with people in China and it was a bit confusing seeing email from two or three different people all of whom had the name '???'!

They certainly took a while, though, only going fully-Unicode in 2003!

# Oren Novotny on 11 Oct 2006 10:40 PM:

Thanks for the informative post!

One thing though is that Office 2007 does not include all of those new LH

fonts like Gisha, but it does include Segue UI.

That means that on an XP machine running Office 2007, the mappings do not

work and ??? shows up.  In this particular case, there is no PST as it's

using an Exchange 2003 SP2 mailbox.

I suppose upgrading to Vista will solve the problem as the fonts will be

there, but for now it seems that Hebrew won't work right in Office 2007 on

XP.  That's very disappointing.

--Oren

# Michael S. Kaplan on 11 Oct 2006 11:56 PM:

The question marks are not due to fonts, and Gisha is never mapped by Uniscribe (as I said!). They map to Microsoft Sans Serif, which has all of the hebrew characters.

Your store that has your Inbox has to be configured to support Unicode. That is the missing link here, and can work just fine in XP (it works for me on XP SP2 with both Outlook 2003 and Outlook 2007).

# Dean Harding on 12 Oct 2006 1:38 AM:

Oren: Contact your Exchange admin if the store is hosted by Exchange.

? only show up when converting to a code-page that doesn't support a character. The 'missing glyph' symbol is a box - if it was a font problem, then you'd get a box, not a question mark.

# rosiris on 1 Apr 2008 4:23 PM:

this is the best website because i'm jewish and i really need to know all the font sizes, baruchu (Ba- ru-ku) thanks and one question how do u get that word pad language??????????????????? bye or shalom!!!!!!!!


referenced by

2008/08/01 My Gut[tman] instinct is...

2007/09/02 Acronyms vs. initialisms, across languages

2006/10/12 It's LIFO (last-in, first-out) in Hebrew

go to newer or older post, or back to index or month or day