Behind the Proposed Change to Tamil in Unicode (five different ways)

by Michael S. Kaplan, published on 2008/09/17 10:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2008/09/17/8954928.aspx


So, I had that Behind the Proposed Change to Tamil in Unicode presentation:

The encoding of Tamil within Unicode has been the subject of displeasure by the government of Tamil Nadu for as long as it has been there. It has led to a proposal (built up over the last decade) to try to change the way that Unicode looks at Tamil, and the very real questions of why this effort has been so persistent and what will eventually happen have not really been discussed overtly in all of this time. This presentation's goal is to talk about why the proposal exists, why it will ultimately fail, and why the language itself can survive that fact. The broader issues of the view of languages and the "rights" of language owners will also be discussed in this case study of a language that has been both wronged and righted as few others have in modern times.

The presentation that John Cowan, in a comment to this blog, asked:

 Are there slides for your talk available?

Anyway, here they are, the many forms of the presentation, in the form available to attendees which is I think the PDF, and four other forms, for yucks (you can pick your favorite, sizes can help guide your decision here!):

Format   Size (zipped) 
  Size (unzipped)  
 Portable Document Format (PDF
1,119kb 1,257kb
 PowerPoint 97-2003 Show (PPT
6,508kb 7,235kb
 PowerPoint 2007 Show (PPTX
5,906kb 6,119kb
 PowerPoint 2007 OpenXML Show (XML)   6,066kb 8,591kb
 XML Paper Specification (XPS
1,750kb 2,025kb

Now this presentation tries to covers 11 years of history that are actually about lots of opinions and beliefs about up to two millenia of history, and how it impacted a proposal that has been around for over seven years that I have had some connection to it. Although it attempts to be as lighthearted as possible, the source material is almost embarrassingly dense. I'll likely blog about some of the more expandable bits of this in the future, including some of the following, as there are a host of issues relating to:

Only some of which I had time to cover in the 50 minutes and in the slides, thus almost begging for some more blogs with the outtakes. :-)

Enjoy!


This blog brought to you by P (U+0050, aka LATIN CAPITAL LETTER P)


# Andrew West on 17 Sep 2008 10:26 AM:

I'm looking forward to "death threats and love affairs" (sounds like a good movie, or maybe the title of your autobiography) -- can we have those first?

# John Cowan on 17 Sep 2008 5:35 PM:

Thanks; great slides; must have been an excellent preso!

One thing I wondered about.  You wrote:

Unicode has had a Korean implementation that allows Hangul syllables to be decomposed into Jamo and Jamo to be composed into Hangul.

Government folk in Korea have expressed displeasure this way of the language being handled for as long as they have known about it, despite linguistic validity

What part are they displeased with, the jamo part or the Johab part, or just the fact that both exist?  Johab handles all of Modern Korean -- indeed over-handles it: some of the represented syllables aren't actually used) -- but you need the jamo representation to do Middle Korean with all its Chinese borrowings.

# Michael S. Kaplan on 17 Sep 2008 5:43 PM:

Hey Andrew,

I may get there eventually, but I'll probably start with the ones more on-topic. :-)

# Michael S. Kaplan on 17 Sep 2008 5:44 PM:

Hey John,

Thanks, I haven't seen ratings yet but I got some good feedback and I had fun. There are other topics that it would be fun to cover or see covered in similar style.... :-)

They do not like the canonical equivalence -- they think that Jamo are fine/dandy for Old Hangul, but modern Hangul should always be in the precomposed Hangul syllables block. The notion of normalization freely supporting the conversion between them, that they just don't care for.

Of course real Korean and other linguists don't mind it, since that is what it is anyway. But  few folks with the ear of the government folk have strong feelings about their language....

# Mike Dimmick on 18 Sep 2008 6:41 AM:

So what needs to be done?

Clearly Unicode won't re-encode the script. That ship has sailed.

But if users really think of their language as a syllabary, software needs to treat it as one even though it is actually encoded differently. That means editors treating the glyphs as indivisible, even though they're composed in Unicode, and keyboard layouts to permit users to enter text one glyph at a time.

That puts a lot of compatibility issues on the vast mass of software out there, though. Could, for example, Uniscribe dictate where character breaks go?


referenced by

2009/09/09 On not being in Germany in October

2008/11/03 Inspiration, and a code chart

go to newer or older post, or back to index or month or day