Unicode -- making a difference

by Michael S. Kaplan, published on 2008/04/30 03:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2008/04/30/8440308.aspx

Regular reader Arun pointed out an interesting article to me:

Check out (
http://www.infoworld.com/archives/emailPrint.jsp?R=printThis&A=/article/08/04/28/10-most-important-technologies-you-never-think-about_1.html) for the 10 most important technologies you never think about... it has Unicode at #1 (or is it #10, I'm not sure!)

Btw, the 'Days left in office' in your sidebar spooked me until I realized you were referring to Dubya and not yourself!


I have to agree it is nice to see Unicode in the pole position of this list entitled The 10 most important technologies you never think about (Without these technologies our world would be a very different place).

The list is unordered, though since it is listed first and it is not first in alphabetical order, I am inclined to guess that mit might be first for a reason. :-)

The text goes:

We use computers for every kind of communication, from IM to e-mail to writing the Great American Novel. The trouble is, computers don't speak our language. They're all digital; before they can store or process text, every letter, symbol, and punctuation mark must first be translated into numbers.

So which numbers do we use? Early PCs relied on a code called ASCII, which took care of most of the characters used in Western European languages. But that's not enough in the age of the World Wide Web. What about Cyrillic, Hindi, or Thai?

Enter Unicode, the Rosetta Stone of computing. The Unicode standard defines a unique number for every letter, symbol, or glyph in more than 30 written languages, and it's still growing. At nearly 1,500 pages and counting, it's incredibly complex, but it's been gaining traction ever since Microsoft adopted it as the internal encoding for the  Windows NT family of operating systems.

Most of us will never need to know which characters map to which Unicode numbers, but modern computing could scarcely do without Unicode. In fact, it's what's letting you read this article in your Web browser, right now.

Kind of says most of it if not all of it. :-)

The full list is also interesting; take a look if you are curious about what else Neil McAllister put on this top 10 list....


This blog brought to you by 𐒉 (U+10489, aka OSMANYA LETTER SHIIN)

# Andrew West on 30 Apr 2008 5:24 AM:

Cool. I particularly liked the expression "Rosetta Stone of computing", which I thought must have been borrowed from somewhere else, but googling it shows only this page and one other page that claims (implausibly) that "the Mac is literally the Rosetta Stone of computing!".

However, some minor complaints. Measuring the complexity of Unicode by the number of pages in the printed book is not such a great idea, especially when "At nearly 1,500 pages and counting" must refer to the Unicode 4.0 book (1462 pages) and in fact the Unicode 5.1 book is only 1247 pages in length!

And my perennial complaint about people's reluctance to use the term "script" in books and articles aimed at a general audience -- "written languages" really is not a good alternative for script, and in my experience children of three know the difference between a language and a script (at least they do in my family). In any case Unicode 5.1 supports 75 scripts, not a mere 30+ that Neil McAllister states. It would have been more impressive if he had mentioned that Unicode now encodes over 100,000 characters.

# Michael S. Kaplan on 30 Apr 2008 10:30 AM:

I am not sure that we can really count your family as typical in this situation? :-)

go to newer or older post, or back to index or month or day