Unicode 5.0 -- the beta!

by Michael S. Kaplan, published on 2005/12/13 11:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/12/13/503150.aspx

This official report went to all of the various distribution lists (sorry, there is still no Unicode RSS feed; the webmaster simply does not believe there is a good reason to move into syndicated feeds.):

The next version of the Unicode Standard will be Version 5.0.0. The beta version of the documentation for Unicode 5.0.0 is located in:


A page describing the beta release is located here:


The Unicode Character Database portion is planned for release at the end of March 2006. A beta version of the 5.0.0 Unicode Character Database files is available for public comment. We strongly encourage implementers to download these files and test them with their programs, well before the end of the beta period. See the beta page for access to the files.

Any comments on the beta Unicode Character Database should be reported using the Unicode reporting form. The comment period ends January 30, 2006. All substantive comments must be received by that date for consideration at the next UTC meeting. Editorial comments (typos, etc) may be submitted after that date for consideration in the final editorial work.

Note: All beta files may be updated, replaced, or superseded by other files at any time. The beta files will be discarded once Unicode 5.0.0 is final. It is inappropriate to cite these files as other than a work in progress.

The Unicode Consortium provides early access to the best known version of the data files to give reviewers and developers as much time as possible to ensure a problem-free adoption of version 5.0.0.

Testers should not commit any product or implementation to the code points in the current beta data files. Testers should also be ready for retesting based on updated data files which will be posted after the February, 2006 UTC meeting.


If you have comments for official consideration, please post them by submitting your comments through our feedback & reporting page:


If you wish to discuss beta issues on the Unicode mail list, then please use the following link to subscribe (if necessary). Please be aware that discussion comments on the Unicode mail list are not automatically recorded as beta comments. You must use the reporting link above to generate comments for official consideration.


Rick McGowan
Unicode, Inc.

So if you are an implementer, try it on for size soon!

(If you think there ought to be an RSS feed, you can post that feedback too -- be sure to give good reasons!)


This post brought to you by "U" (U+0055, a.k.a. LATIN CAPITAL LETTER U)

# Rosyna on 13 Dec 2005 12:47 PM:

You have *got* to be kidding me. Arg. Another version? Pfft. What's the current version of Unicode Windows XP supports (completely)? I think you mentioned it before but I cannot remember where.

# Michael S. Kaplan on 13 Dec 2005 1:02 PM:

As versions go, it is actually a nice one, IMHO.

I'll get into the Unicode version thing in Windows another day -- the short answer is "it depends." :-)

# Rosyna on 13 Dec 2005 1:31 PM:

I'm more worried about the people that have to implement this. Unicode 4.1 isn't that old (less than a year) and Unicode 4.0 was released just over 2 years ago. I imagine many implementations haven't yet even "upgraded" to those. Windows hasn't even had a consumer product update since 2001 and 2003 for "Pro". But I'm not sure how the unicode support in Windows tracks with releases. Does it wait for super major releases that MS charges for or does it get updated with the smaller updates as needed?

I can't even imagine what the situation is for things like Linux and its graphical environments. Just thinking about all these people finally implementing 4.0 just to have 5.0 shoved in their faces.

# Ben Bryant on 13 Dec 2005 1:45 PM:

I think it is great that they keep forging ahead. I am not as concerned about keeping up with the standard since all you really need is enough of it to satisfy your customers. When you are ready to take another step, you are glad that the Unicode Org has already addressed the issues. If Microsoft is not completely up to the latest version of Unicode and previous Windows versions are even further behind, it just goes to show that being right up to the Unicode spec is not necessary.

# Michael S. Kaplan on 13 Dec 2005 3:46 PM:

Hi Rosyna,

Ben is right on this one -- not even full members of the Unicode Consortium (like Microsoft) are always on the bleeding edge here, and therefore valid subsets do not become invalid just because a new version comes out.

# Nick Lamb on 13 Dec 2005 7:14 PM:

"I can't even imagine what the situation is for things like Linux and its graphical environments."

A lot of the changes are handled automatically, various developers and packagers will pick up the new data files from Unicode.org and run their existing scripts to process those files into their internal data formats. This happens for collation, case-compression and similar algorithms.

In the past there have been big changes like the introduction of characters outside the BMP that required some real development work, but Unicode 5.0 doesn't look to have any changes on that scale.

Renderers like Παν語 are modular, so an interested government can trivially sponsor some friendly experts to write and test support for their writing system. This appears to have happened for the forth-coming Balinese support in Unicode 5.0 for example.

# Rosyna on 13 Dec 2005 8:24 PM:

I was talking more about the features outside of the unicode database. That part is easy. It's adding the things that cannot be parsed from a file that are the things I was commenting on. Especially if any of those slightly change a behaviour specified in a previous version.

I've got a very pessimistic outlook and believe, in general, developers only do what is "good enough" to get Y thing done. If Y is redefined, then it may take a rewrite to implement the new definition even if it only differs slightly from the previous definition.

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

referenced by

2005/12/18 More on the Unicode 5.0 beta

go to newer or older post, or back to index or month or day