Grapheme Clusters -- betcha can't eat just one[Unicode 5.1 character]....

by Michael S. Kaplan, published on 2008/02/27 10:16 -05:00, original URI:

The announcement came in this afternoon:

Unicode 5.1.0 beta period now closed

The beta period for Unicode 5.1.0 has closed. We are now in the pre-publication phase and expect to have the final release around March 31. No more substantive changes are planned, beyond those already approved by the Unicode Technical Committee. However, if you have editorial comments on the text of Unicode 5.1.0 please report via the online reporting form.

Unicode 5.1.0 page:

Online contact form:

Rick McGowan
Unicode, Inc.

Okay, we are not quite at the point where one would yell Stop the Presses! if there was some kind of urgent problem that would require attention. I have no idea what that process would look like -- say if the website were on fire or something?

Well, you can also report any editorial comments, too....

Anyway, we can sit around and wait for the release.

I'll open a box of Grapheme Clusters

and we'll make a party of it....

It is funny, it was just yesterday I was mentioning grapheme clusters in Unicode in a comment, and today we're eating the cookies. Awesome!

From that link above on the version:

This is a draft page for the eventual specification of Unicode 5.1.0. This page is under development and may be modified without notice until Unicode 5.1.0 is released.

Unicode 5.1.0 is currently in the pre-publication phase and is due for release at the end of March 2008. No more substantive changes are planned, beyond those already approved by the Unicode Technical Committee. However, if you have editorial comments on the text of Unicode 5.1.0 please report via the online reporting form.

Last updated 26-February-08

A. Summary

Unicode 5.1 brings major benefits: improvements for security in data exchange, character additions to support Indic and South East Asian scripts, improvements to the Unicode Linebreaking Algorithm statement of conformance, standardized named sequences for Lithuanian, and provisional named sequences for Tamil. Identifiers were expanded to allow full support for Indic and Arabic scripts.

Implementers will find new test data files and additional new XML data files with character properties for all Unicode characters.

Several important property definitions were extended, improving linebreaking for Polish and Portuguese hyphenation. The Unicode Text Segmentation Algorithms, covering sentences, words, and characters, were greatly enhanced by creating extended combining character sequences that improve the processing of Tamil and other Indic languages. The Unicode Normalization Algorithm now defines stabilized strings and provides guidelines for buffering.

This latest version of Unicode adds new characters required for Malayalam and Myanmar and important individual characters such as Latin capital sharp s for German. Version 5.1 extends support for languages in Africa, India, Indonesia, Myanmar, and Vietnam, with the addition of the Cham, Lepcha, Ol Chiki, Rejang, Saurashtra, Sundanese, and Vai scripts. Scholarly support includes important editorial punctuation marks, as well as the Carian, Lycian, and Lydian scripts, and the Phaistos disc symbols. Other new symbol sets include dominoes, Mahjong, dictionary punctuation marks, and math additions. Unicode 5.1 contains significant additions and improvements that extend text processing for software worldwide.

You can see the rest of the text here. :-)


This blog brought to you by(U+2fb7, aka KANGXI RADICAL EAT, though not by Keebler since that was a bit of parody!)

no comments

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

go to newer or older post, or back to index or month or day