Math is hard, let's do Unicode!

by Michael S. Kaplan, published on 2005/11/03 03:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/11/03/488525.aspx


(dedicated to the '_____ is hard, let's go shopping!' crowd in Building 24)

A reader of this blog by the admittedly unlikely name of Mr. Blobby asked in the Suggestion Box back in the end of June:

I would love to learn about the specific challenges (if any) of mathematics on computers. Broad areas of discussion:

1) what TeX gets right
2) what it gets wrong (if any)
3) mathematics support in Unicode
4) challenges of large nesting requirements, and so on.
5) Microsoft's mathematical software (in Word etc)

Oh, and please please get on the back of the IE team to support MathML natively!

Very good questions!

Although I have to admit that even though I wrote that post for those who enjoy mathematics (or, 'Also new in Vista'), I don't know enough to answer these questions. :-(

Luckily I can ask Murray Sargent and he can give the answers, instead! So here goes:

1) what TeX gets right

TeX is a very successful system for formatting technical documents with high typographic quality and has become a publishing industry standard. Its input method and the program itself work on virtually all personal computers. Its macro facility allows users to customize their documents.

2) what it gets wrong (if any)

The input can be hard to read for complex mathematical expressions. While pioneering in its day, the font technology is limited in that it uses multiple character sets of 256 symbols instead of Unicode and lacks the power of OpenType font technology. The use of macros can lead to a "tower of Babel" that complicates interchange with MathML and other computer formats.

3) mathematics support in Unicode

See http://www.unicode.org/reports/tr25/

4) challenges of large nesting requirements, and so on.

Math display systems need the ability to nest 2D math constructs, like built-up fractions, arbitrarily deeply with brackets able to expand large enough to enclose such nested expressions. The brackets can be obtained by glyph variants for smaller sizes and glyph assemblies for larger ones. Glyph variants are very useful for other needs as well such as sub/superscripts. Horizontal as well as vertical spacing is crucial for good looking display. A very helpful reference for mathematical typography principles is Appendix G of The TeXbook.

5) Microsoft's mathematical software (in Word etc)

Currently there's the Equation Editor with MathType as its more complete rendition. Also we're working on something.

Oh, and please please get on the back of the IE team to support MathML natively!

We're hoping to add MathML support into IE (hope springs eternal). One solution is to download the MathPlayer (http://www.dessci.com/en/products/mathplayer/) from the authors of the Equation Editor. MathPlay enables IE to display MathML.

See, I told you he would know better than me!

 

This post brought to you by "" (U+222c, DOUBLE INTEGRAL)


# Eric Wilson on 4 Nov 2005 11:08 AM:

Why does Unicode define a Double Integral glyph? Wouldn't two single integral glyphs be the same thing?

# Lionel Fourquaux on 4 Nov 2005 11:21 AM:

>The input can be hard to read for complex
>mathematical expressions.

It's not only hard to read, it's also hard to type. However, int he long run it's the only reasonable system for typesetting math that I have tried. Especially, palette-based systems often turn out to be much too contrained for complex formulas.

I've read somewhere (I don't remember exactly) a suggestion that math input should be handled as a kind of specialized IME. This, IMHO, is a *very good idea*. There are clearly some common issues, e.g. how to access a large number of symbols in an intuitive (or at least easily memorized) way. Moreover, I think usability experts could come up with better solutions than what is currently available (possibly with a lot of work).

One point where math input differ from a normal IME is the usefulness of user-defined and per-document shortcuts. For math documents, quick access to (and easy modification of) frequently used notations is a huge time saver.

>the font technology is limited in that it
>uses multiple character sets of 256 symbols
>instead of Unicode and lacks the power of
>OpenType font technology

There is some evolution on these points (etex, omega, pdftex, aleph, xetex), but yes, basically the core system is frozen and rewrites (e.g. ExTeX) are far from completion.

>Currently there's the Equation Editor with
>MathType as its more complete rendition.

Errr... Equation Editor is nearly unusable. Really. Not only it feels unstable (rendering inconsistensies, lots of flickering), but it is so limited that it can hardly be considered as more than a toy. It doesn't seem to be developed any more, too. I hope it'll be replaced someday by a better tool.

MathType is clearly more powerful. I've not tried it, but from what I've seen I'm not tempted to try it. It's still palette-based, and the only time I tried to solve a problem involving it, the only explaination I found on DESSCI's website was "it's a bug in Word", which may be true but is no real solution. Also, see below on security.

>Also we're working on something.

Interesting! Is there a way to learn more on the subject? (url?)

>We're hoping to add MathML support into IE

Good! From the look of things, it may take some time, but I'm happy to hear this.

One thing I wonder about, is whether MathML could be embedded (with a plugin) in the new Office Open Document Format.

>One solution is to download the MathPlayer

Ugh. I tried it. The first time it's loaded in IE, you get a rather intrusive splash screen. And the math fonts are not so pretty.

Worse, there is *no* information on security on DESSCI's website. Their first reaction to the local zone lockdown in SP2 was to find out how to *turn it off*. This doesn't give me a lot of trust in their products, and for a binary browser plug-in that's very poor advertising. For now, I don't use MathPlayer, and I won't until they put a lot more emphasis on security and reliability in their development process (and show it clearly, so that I'm informed).

Anyway, thanks for asking around, for this post, and for many other interesting posts.

referenced by

2007/10/31 The L word (Limonata, I mean - the *other* L word)

2007/08/09 The phantom kern get smashed, but he may invade someone else's space if he does

go to newer or older post, or back to index or month or day