Unicode 6.0 -- one web page at a time

by Michael S. Kaplan, published on 2011/04/19 07:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2011/04/19/10155540.aspx

Reactions to that video of the Unicode characters I talked about in 49571 ≈ 2π Sympathetic Characters, which I believe proves truth ≈ beauty were plentiful.

But regular reader Andrew West was not impressed. His reaction in a comment to that blog on the same day:

It would be cool if it did show every single graphic Unicode character, or even all characters in the BMP, but it is 5,000 characters short of what's currently in the BMP, and seems not to include anything much encoded during the last ten years.

Then, a few days later, he decided to try and do a better job, in his own blog titled Unicode 6.0 — One character at a time. From his intro:

A recent youtube video by jörg piringer that scrolls through "all" 49,571 Unicode characters in 33 minutes and 16 seconds (25 characters a second) has been doing the rounds, but I'm afraid that I was not impressed. The 49,571 characters in the video only cover the BMP, and even then it is 5,000 characters short, missing out most of the characters that have been added to Unicode over the past ten years, and missing out entirely some scripts that have been in Unicode since Year Zero.

Unicode version 6.0 (released October 2010) actually defines 109,384 characters (109,244 graphic and 140 format characters). How many of them you are able to see depends upon your operating system, your browser and whether you have additional fonts installed covering obscure and recently encoded scripts and characters (and whether your browser will actually apply those fonts or not). On my Windows 7 SP1 machine, with no additional fonts installed, I can see 95,372 of these 109,384 characters (87.1% coverage of total number of characters, but only fully covering 66 out of 203 blocks, and 85 blocks with no coverage at all).

Now that says it all for me, on several levels.

And then his "video" is pretty amazing too. Check out his blog, and also the JavaScript page. You'll see what I mean.

Lessons learned:

Thanks, Andrew! :-)

Andrew West on 20 Apr 2011 12:51 AM:

Thanks for the kind words.  I agree that the Mac is way overrated wrt international support.  Windows 7's out of the box coverage of Unicode is very impressive, and even more so if you take into account the fact that its target version of Unicode was 5.1, for which it has a whopping 94.75% coverage.  And it is not just the major, commercially important scripts that are supported, but minor modern-use scripts such as Tibetan, Mongolian, Nuosu (Yi), Nko, Osmanya, Tifinagh and Vai, as well as historical scripts such as Phags-pa, Runic and Ogham.  Of course, there is always room for improvement (and I haven't been shy in pointing out my frustrations with Windows) -- for example, how about updating Microsoft Himalaya to cover the 18 additional Tibetan characters that have been encoded since the font was first designed?

Michael S. Kaplan on 5 May 2011 9:25 AM:

Funny that Joe considers this a cheap shot -- this relates to straight comparisons of out-of-the-box coverage of both platforms....

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

go to newer or older post, or back to index or month or day