Short-sighted text processing #4: Squeezing every bit of text you possibly can out of MacOffice 2011

by Michael S. Kaplan, published on 2011/01/04 07:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2011/01/04/10110938.aspx


Previous blogs in this series:

 Let's change tracks for a bit, and head over to Mac OS X and Office 2011. it can't always about Uniscribe, can it?

Oh, wait.

I guess it can! :-)

Yes, I finally got MacOffice 2011 installed (see here for more about those travails).

Now the text processing story on OS X and on Mac Office is an interesting one, more because of the differences between OT (OpenType) vs. AAT (Apple Advanced Typography) on the Apple platform than the general philosophical Microsoft vs. Apple (The pixel grid vs. the font designer) differences.

Now almost every font maker I have talked to likes AAT better than OT, and some of them can even articulate reasons for this. In fact, that issue might make a good blog on its own in the future. But for now I will focus only on the upshot for Tamil support.

I'll mention Tom Gewecke's question from the Suggestion Box:

Do you still have a MacBook?  As you probably know, Office for Mac is notorious for its inability to do RTL, Indic, and numerous other scripts which Office for Windows and all other Mac apps can handle.  But I think that the new Office 2011 may be able to do Tamil, which I know is a favorite of yours.  It is listed in the keyboard support Help and I think the InaMaithai font is recognized.  I'm not able to test this myself, as there is no free trial and I don't use Office, but you might want to see if it really does correct Tamil.  That would be something of a breakthrough.

Office 2008 claimed to do Tamil when it came out, but as far as I know that was incorrect, it could never recognize the right font and generated boxes or other garbage.

There are also a few interesting blog comments elsewhere like MS Office 2011 Language Support  where Tom specifically questions whether Tamil works at all and this product support forum page where as far as I can tell someone is claiming it works while other have problems.

For now, let's just go in and try to make everything work.

First I'll add some fonts to Mac OS X, in their own special group, to support Tamil (one is already there, two others I will add):

The three fonts shown here in Font Book, are:

Now there are no cases where Latha worked 100% of the the time but one case where it worked some of the time; (suggesting some OpenType support). There is one case where InaiMathi did not work at all, suggesting that app has no AAT support at all. There is just one case where only Anjal Muivar worked completely, which suggests that supporting both OT and AAT simultaneously gives some benefit, though no font shipping in OS X and/or Mac Office does this at present, for Indic at least.

There are some dangers in this kind of indirect speculation about support but good and bad results both make for a compelling case for what works (and doesn't), even if not for the backstory behind what works (and doesn't).

Obviously, the OT support between Latha and Anjal Munivar may not be exactly the same (there are sometimes different ways to solve the same problem in OpenType, with the downside being that they may not all be supported fully -- this is especially true here!).

Let me also say that Muthu Nedumaran's help -- both to Apple (to whom he provided the input methods) and to me (to whom he provided several of the bug reports you'll be seeing below, and the font that works nearly everywhere) -- has been invaluable, and highly appreciated. Thanks, Muthu!

Now let's look at each Office application:

Wow, interesting! Even though the font is in Font Book, only Outlook puts it in the font list of all the Mac Office 2011 applications! The rest show my custom group but only with the two fonts that have AAT support! Or maybe the filtering is done for some other reason.

The big font list has the same story; my custom font group is just for your benefit, as it makes it easier to see results all at once.

Okay, let's look at actual support in the apps of Tamil, a script that requires some amount of glyph reordering and glyph substitution (relatively speaking, a smaller amount than some other Indic scripts).

I will use the Anjal keyboard, which also ships with OS X and which I wish shipped in Windows (and would if not for some unrelated short-sighted text issues that maybe I can discuss another day!).

It is inconceivable that the Mac Office test team would have been unable to have done this kind of testing before the product shipped....

Let's take some text and try things out with as many fonts as each application will let us try:

Now this is all pretty fascinating -- especially the way Word seems to only do OpenType (yet blocks the OpenType only font) while Excel and PowerPoint seem to only do AAT, while Outlook seems comfortable sort of doing both.

Though that Outloook support? It is deceptive, I'll explain why in a moment.

First we'll try a little matrix of support for all applications:

Application AAT
Support
OpenType
support
Word NO MOSTLY
PowerPoint YES NO
Excel YES NO
Outlook YES KINDA

Here you can see some broken text, in an Outlook message -- if you try to type my name (Michael) in Tamil in Outlook using the Latha font (click on the image to see how it looks it typed one character at a time):

 To repro it, you would just type <SPACE><m><a><i><k><k><a><l><SPACE>. If you look at the first two letters of that string on top, they are not properly reordered in the case where Latha is used.

So OpenType is supported here, somewhat. Though it appears to be OpenType support minus

which as any OoenType expert will tell you is really is not all that supportive yet (these are again some core tables, especially GSUB).

Given that OpenType support is incomplete in OS X itself and MacOffice 2011 has that "Uniscribe framework component", I assume this bug is mostly on the Microsoft side, with some strange negotiation between the two potential OpenType support sides, unless it is just MacOffice's fault entirely, of course. This might also help explain some if the differences between Outlook and Word.

I have to wonder, though. These tables obviously have to work for other scripts that are supported or otherwise there is no support at all. So why would they go to the trouble to add shaping engines that don't even support the easy stuff that the font maker can specify before the full support is available? This seems like a bad decision, whoever is ignoring these crucial pieces of OpenType like GSUB/GPOS that can a long way for font authors in the "fake it til you make it" department....

To be honest this is one of the reasons that font makers can prefer AAT to OpenType: they don't have be so much at the mercy of what isn't in the shaping engine(s) yet.

And of course there is the fact that Word doesn't support AAT in the meantime; this seems very short-sighted. As does the very poor job the clipboard does when you move between these different apps that may be using different fonts. It suggests that there is no good notion of font fallback or interop there, either.

Overall, the fact that it is possible to get these languages working but not enough interop work to get one around the rough spots is really unfortunate. Because some of that interop work would really have allowed Mac Office 2011 to fail more gracefully while support is not fully there yet (one of the most crucial points of the Office suite is interop between the various Office applications).

And there are other problems too.

Like the fact that Word does not show some combined forms properly that the input method tries to add (e.g. click on these three pictures to see the effect of typing <SPACE><n><a><n><a><SPACE> in TextEdit and PowerPoint and Word, where the first two look correct and the third does not):

That is a bug. Something Word is doing to the input stream.

And even beyond that there are other random display issues.

Like if you type certain strings in PowerPoint you will see visual weirdness like the following (click on the picture to see it "typed"):

Now note that this type of problem in PowerPoint is two-fold; first the cursor ends up between the "m" and the "ai" and then that weird space appears that you can see above. But these problems in PowerPoint do not corrupt the actual text, it just leads to ugliness in the display. However, problems like this can really affect overall usability.

So, to sum up, the answer to the question "does Office 2011 support Tamil?".

I'd say at a "beta quality" level, it does.

But if you are looking for the kind of polish in each and any app, or good interop between them, then you are waiting for either a mondo service pack or a new version....


Daniel on 4 Jan 2011 8:47 AM:

I wonder if tweaking Word's Advanced typography settings in Format - Font - Advanced would make a difference.

That's how you can turn on ligatures and such for simpler scripts at least.

I also found that fonts often go missing in Office 2011 font listings, but I thought it's due to having a large number of fonts installed on your system.

orcmid on 4 Jan 2011 12:41 PM:

Oh my.  Is there some sort of reference implementation or tool somewhere that would show folks how to get this right?

Rick Schaut on 4 Jan 2011 1:20 PM:

Michael,

First, thanks for noticing Mac Office! We toil over here in the netherlands (well, less so now that we're in MOD and have moved from bldg 115 to bldg 31), doing the best we can with the resources we have, and, as you've noticed, that leads to some interesting trade-offs.

One of these days, I'll get around to writing a complete blog post about this, but a couple of points to keep in mind with Mac Office:

1) Both Mac Office and Mac OS are in transition regarding text rendering. Among other things, ATSUI is now deprecated. This transition means that putting effort into getting things like Tamil and RTL languages right would have involved work that we'd have to throw away in the very near future.

2) While compatibility with Win Office is an important design decision for Mac Office, each of the four apps defines "compatible" in rather different terms. Outlook, for example, defines "compatibility" in terms of support for the services of Exchange, while, for Excel, the most important compatibility issue involves dates. As a consequence, we actually have four different text rendering engines in Mac Office. Outlook uses WebKit. Word, where line and page layout are the most important issues, uses a combination of PTLS, Uniscribe and ATSUI.

Lastly, I'll endorse Daniel's suggestion to play with Word's advanced typography settings in the Format/Font dialog box. You can also add the Typography pod to the Ribbon via Word/Preferences/Ribbon.

Michael S. Kaplan on 4 Jan 2011 4:19 PM:

Hi Rick,

I did try the advanced typography settings but it did not really help. For Word, the #1 font was the font that had both AAT and OT (this is something that does not ship by default so it would require a third party font such as Muthu's) and there still do remain some issues that I hope can be fixed in the future. I am definitely happy to help in any way I can, though. :-)

Michael S. Kaplan on 5 Jan 2011 4:17 AM:

In the end though, my fondest hope is that there is a tester or testers over there who can extract the bugs from this blog so that they can be addressed in the future, where it makes sense....

Kalaimani (Singapore) on 30 Jan 2011 6:20 AM:

Just to share...

I had just completed a Tamil input driver based on the 'Old Typewriter layout" with which I am more familiar with.

I tested the driver with my own unicode font UniParanar and with Latha in WORD_MAC 2011.

Happily, I could type successfully. However, I could not use the Inaimathi fontor Arial unicode which appeared as boxes.

But I could not use my driver in Pages - GSUB did not work.

Will do more testing using my driver and Tamil Unicode fonts in different programs.

Kalaimani

Singapore


referenced by

2011/08/10 'If you want a happy ending, it depends on where you stop the story.'

2011/01/06 Short-sighted text processing #6: OpenType and Apple and OpenType

2011/01/05 Short-sighted text processing #5: PU[A]! That pad THAI is pretty spicy....

go to newer or older post, or back to index or month or day