Inspiration, and a code chart

by Michael S. Kaplan, published on 2008/11/03 10:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2008/11/03/9029808.aspx

Way back in September after I did that presentation at the Internationalization and Unicode conference that I mentioned and provided the slides of in Behind the Proposed Change to Tamil in Unicode (five different ways), Scott sent me the following via the contact list:

I think that what Scott did here was excellent, and I did not note anything horribly wrong at all....

Because even though that is the secret hope I have for some of my talks (especially including this one), it is really awesome to see it spelled out in such a way.

The chart he provided was similar top but not the sam as the ones I provided in Learn Tamil in 30 Days (or something like that), and help people look at Tamil in Unicode the way that they might learn Tamil, something the simple code allocation chart would never be able to do -- in its own way something Uniciode cannot do without prioviding this same crucial bit of infomation in a familar form.

Thanks, Scott -- both for this and for supporting my non-interferece policies WRT Wikipedia! :-)

Which reminds me that I promised to talk more about some of the issues I didn't have time to cover in the talk. I'll be sure to get on that....

Michael, thanks again for the inspiration, and for your kind words.

I have recently revised the Tamil script Wikipedia page to add rows to the Unicode syllabary for syllables starting with J-, SS-, S-, H-, and KSS-, following a similar addition to the Tamil compound table earlier in the article.

I must admit, my knowledge of Tamil script is quite limited, and I have been drawing heavily on the work of others in additions to the Tamil script Wikipedia article. I have limited to the scope of the Unicode syllabary chart to those Tamil compounds listed earlier in the article, because I don't truly understand what ought to be listed in this page.

In particular, I have omitted the Tamil syllables containing U+0BB6, which are included in the Unicode 5.1 named sequences for Unicode, because I don't know why these compounds aren't mentioned earlier in the page. Are the used for transcribing words in other languages such as Hindi or Sanskrit? Are the used for writing minority languages that share the Tamil script? Are they historical characters no longer in modern use? Any of these reasons could make these characters politically sensitive. Does anyone know more about these characters?

I also omitted the named sequence for SHRII because it doesn't fit in the table format, and I don't undstand its usage or significance either.

Hi Scott,

we were told that SHRII is only being used in Sri Lanka (which is not how do they call the island, either), so it's significance is probably not the highest, though we learned it as a standard grandha letter - and it took me a while to figure out how could this one be written if you do not have the Tamil keyboard, so some might find it helpful to know it is SA+Virama+RA+II. This also suggest that you cannot put it into the syllabary table, unless you include it in the II (U+0B89) column only, leaving others empty (which seems as a quite good idea to me).

I'm not sure about the U+0BB6 either, but I will ask today at school if Michael does not cover it sooner...

Hello, I'm back. You can find pretty good information about SHA (U+0BB6) and how it relates to SHRII in the SHA proposal, which can be found at

http://std.dkuug.dk/jtc1/sc2/WG2/docs/n2617.pdf

For what I've found it, it is used very rarely, and for transcribing words (from Sanskrit etc.), and it belongs to the grantha characters. I don't think it is much more politically sensitive than other grantha characters.

Hope this helps :)