by Michael S. Kaplan, published on 2008/02/26 10:16 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2008/02/26/7898303.aspx
So while I was in India, I picked up a bunch of books (my suitcase was probably 30 pounds heavier!).
One book that hardly weighed anything at all was a small one titled Learn Tamil in 30 Days by N. Jegtheesh, B.A., part of the National Integration Language Series.
No, it wasn't that I was necessarily looking to learn Tamil in 30 days or anything like that.
And although the different logic it had for character counts than others had been talking to lately was intereting, that isn't what closed the sale, either.
I was mainly interested in seeing how a native speaker of Tamil would have explained the language to someone else.
Plus there was a big table spanning several pages that just caught my eye. I have mostly reproduced it here (though swapping the "X" and "Y" axes).You can note the different transliterations for letters that are used -- I'd say it would provide some hints for this post, though not as many as I might have liked since it has everything as uppercase.
(The most annoying part of this table was how much crap Word added to it, even when I saved it as filtered HTML. I guess the un prefix in that word filtered is silent in Microsoft Word? Though luckily my version of the filtered file is about 20% of the size yet looks identical!)
ஃ
AKHஅ
Aஆ
AAஇ
Eஈ
EEஉ
Uஊ
OOஎ
Āஏ
AEஐ
Iஒ
Oஓ
OHஔ
OUஃ
AKHக்
Kக
KAகா
KAAகி
KEகீ
KEEகு
KUகூ
KOOகெ
KĀகே
KAEகை
KAIகொ
KOகோ
KOHகௌ
KOUக்
Kங்
NGங
NGAஙா
NGAAஙி
NGEஙீ
NGEEஙு
NGUஙூ
NGOOஙெ
NGĀஙே
NGAEஙை
NGAIஙொ
NGOஙோ
NGOHஙௌ
NGOUங்
NGச்
CHச
CHAசா
CHAAசி
CHEசீ
CHEEசு
CHUசூ
CHOOசெ
CHĀசே
CHAEசை
CHAIசொ
CHOசோ
CHOHசௌ
CHOUச்
CHஞ்
GNஞ
GNAஞா
GNAAஞி
GNEஞீ
GNEEஞு
GNUஞூ
GNOOஞெ
GNĀஞே
GNAEஞை
GNAIஞொ
GNOஞோ
GNOHஞௌ
GNOUஞ்
GNட்
Dட
DAடா
DAAடி
DEடீ
DEEடு
DUடூ
DOOடெ
DĀடே
DAEடை
DAIடொ
DOடோ
DOHடௌ
DOUட்
Dண்
NNண
NNAணா
NNAAணி
NNEணீ
NNEEணு
NNUணூ
NNOOணெ
NNĀணே
NNAEணை
NNAIணொ
NNOணோ
NNOHணௌ
NNOUண்
NNத்
THத
THAதா
THAAதி
THEதீ
THEEது
THUதூ
THOOதெ
THĀதே
THAEதை
THAIதொ
THOதோ
THOHதௌ
THOUத்
THந்
Nந
NAநா
NAAநி
NEநீ
NEEநு
NUநூ
NOOநெ
NĀநே
NAEநை
NAIநொ
NOநோ
NOHநௌ
NOUந்
Nப்
Pப
PAபா
PAAபி
PEபீ
PEEபு
PUபூ
POOபெ
PĀபே
PAEபை
PAIபொ
POபோ
POHபௌ
POUப்
Pம்
Mம
MAமா
MAAமி
MEமீ
MEEமு
MUமூ
MOOமெ
MĀமே
MAEமை
MAIமொ
MOமோ
MOHமௌ
MOUம்
Mய்
Yய
YAயா
YAAயி
YEயீ
YEEயு
YUயூ
YOOயெ
YĀயே
YAEயை
YAIயொ
YOயோ
YOHயௌ
YOUய்
Yர்
Rர
RAரா
RAAரி
REரீ
REEரு
RUரூ
ROOரெ
RĀரே
RAEரை
RAIரொ
ROரோ
ROHரௌ
ROUர்
Rல்
Lல
LAலா
LAAலி
LEலீ
LEEலு
LUலூ
LOOலெ
LĀலே
LAEலை
LAIலொ
LOலோ
LOHலௌ
LOUல்
Lவ்
Vவ
VAவா
VAAவி
VEவீ
VEEவு
VUவூ
VOOவெ
VĀவே
VAEவை
VAIவொ
VOவோ
VOHவௌ
VOUவ்
Vழ்
ZHழ
ZHAழா
ZHAAழி
ZHEழீ
ZHEEழு
ZHUழூ
ZHOOழெ
ZHĀழே
ZHAEழை
ZHAIழொ
ZHOழோ
ZHOHழௌ
ZHOUழ்
ZHள்
LLள
LLAளா
LLAAளி
LLEளீ
LLEEளு
LLUளூ
LLOOளெ
LLĀளே
LLAEளை
LLAIளொ
LLOளோ
LLOHளௌ
LLOUள்
LLற்
RRற
RRAறா
RRAAறி
RREறீ
RREEறு
RRUறூ
RROOறெ
RRĀறே
RRAEறை
RRAIறொ
RROறோ
RROHறௌ
RROUற்
RRன்
Nன
NAனா
NAAனி
NRனீ
NEEனு
NUனூ
NOOனெ
NĀனே
NAEனை
NAIனொ
NOனோ
NOHனௌ
NOUன்
Nஃ
AKHஅ
Aஆ
AAஇ
Eஈ
EEஉ
Uஊ
OOஎ
Āஏ
AEஐ
Iஒ
Oஓ
OHஔ
OUஃ
AKH
Anyway, I just thought I'd share it with all of you.
Note how it puts two N entries in there (in Unicode the first one is ன (U+0ba9, TAMIL LETTER NNNA) and the second one is ந (U+0ba8, TAMIL LETTER NA). But beyond that, several of the transliterations do seem quite odd to me, as used to the character names as I am....
Some others letters are listed after these ones (like the ones used in Tamil Grantha, etc).
Let's try it again with some Unicode code points in it, just for grins:
ஃ
AKH
0B83அ
A
0B85ஆ
AA
0B86இ
E
0B87ஈ
EE
0B88உ
U
0B89ஊ
OO
0B8Aஎ
Ā
0B8Eஏ
AE
0B8Fஐ
I
0B90ஒ
O
0B92ஓ
OH
0B93ஔ
OU
0B94ஃ
AKH
0B83க
KA
0B95கா
KAA
0B95 0BBEகி
KE
0B95 0BBFகீ
KEE
0B95 0BC0கு
KU
0B95 0BC1கூ
KOO
0B95 0BC2கெ
KĀ
0B95 0BC6கே
KAE
0B95 0BC7கை
KAI
0B95 0BC8கொ
KO
0B95 0BCAகோ
KOH
0B95 0BCBகௌ
KOU
0B95 0BCCக்
K
0B95 0BCDங்
NG
0B99 0BCDங
NGA
0B99ஙா
NGAA
0B99 0BBEஙி
NGE
0B99 0BBFஙீ
NGEE
0B99 0BC0ஙு
NGU
0B99 0BC1ஙூ
NGOO
0B99 0BC2ஙெ
NGĀ
0B99 0BC6ஙே
NGAE
0B99 0BC7ஙை
NGAI
0B99 0BC8ஙொ
NGO
0B99 0BCAஙோ
NGOH
0B99 0BCBஙௌ
NGOU
0B99 0BCCங்
NG
0B99 0BCDச்
CH
0B9A 0BCDச
CHA
0B9Aசா
CHAA
0BBEசி
CHE
0B9A 0BBFசீ
CHEE
0B9A 0BC0சு
CHU
0B9A 0BC1சூ
CHOO
0B9A 0BC2செ
CHĀ
0B9A 0BC6சே
CHAE
0B9A 0BC7சை
CHAI
0B9A 0BC8சொ
CHO
0B9A 0BCAசோ
CHOH
0B9A 0BCBசௌ
CHOU
0B9A 0BCCச்
CH
0B9A 0BCDஞ்
GN
0B9E 0BCDஞ
GNA
0B9Eஞா
GNAA
0B9E 0BBEஞி
GNE
0B9E 0BBFஞீ
GNEE
0B9E 0BC0ஞு
GNU
0B9E 0BC1ஞூ
GNOO
0B9E 0BC2ஞெ
GNĀ
0B9E 0BC6ஞே
GNAE
0B9E 0BC7ஞை
GNAI
0B9E 0BC8ஞொ
GNO
0B9E 0BCAஞோ
GNOH
0B9E 0BCBஞௌ
GNOU
0B9E 0BCCஞ்
GN
0B9E 0BCDட்
D
0B9F 0BCDட
DA
0B9Fடா
DAA
0B9F 0BBEடி
DE
0B9F 0BBFடீ
DEE
0B9F 0BC0டு
DU
0B9F 0BC1டூ
DOO
0B9F 0BC2டெ
DĀ
0B9F 0BC6டே
DAE
0B9F 0BC7டை
DAI
0B9F 0BC8டொ
DO
0B9F 0BCAடோ
DOH
0B9F 0BCBடௌ
DOU
0B9F 0BCCட்
D
0B9F 0BCDண்
NN
0BA3 0BCDண
NNA
0BA3ணா
NNAA
0BA3 0BBEணி
NNE
0BA3 0BBFணீ
NNEE
0BA3 0BC0ணு
NNU
0BA3 0BC1ணூ
NNOO
0BA3 0BC2ணெ
NNĀ
0BA3 0BC6ணே
NNAE
0BA3 0BC7ணை
NNAI
0BA3 0BC8ணொ
NNO
0BA3 0BCAணோ
NNOH
0BA3 0BCBணௌ
NNOU
0BA3 0BCCண்
NN
0BA3 0BCDத்
TH
0BA4 0BCDத
THA
0BA4தா
THAA
0BA4 0BBEதி
THE
0BA4 0BBFதீ
THEE
0BA4 0BC0து
THU
0BA4 0BC1தூ
THOO
0BA4 0BC2தெ
THĀ
0BA4 0BC6தே
THAE
0BA4 0BC7தை
THAI
0BA4 0BC8தொ
THO
0BA4 0BCAதோ
THOH
0BA4 0BCBதௌ
THOU
0BA4 0BCCத்
TH
0BA4 0BCDந்
N
0BA8 0BCDந
NA
0BA8நா
NAA
0BA8 0BBEநி
NE
0BA8 0BBFநீ
NEE
0BA8 0BC0நு
NU
0BA8 0BC1நூ
NOO
0BA8 0BC2நெ
NĀ
0BA8 0BC6நே
NAE
0BA8 0BC7நை
NAI
0BA8 0BC8நொ
NO
0BA8 0BCAநோ
NOH
0BA8 0BCBநௌ
NOU
0BA8 0BCCந்
N
0BA8 0BCDப்
P
0BAA 0BCDப
PA
0BAAபா
PAA
0BAA 0BBEபி
PE
0BAA 0BBFபீ
PEE
0BAA 0BC0பு
PU
0BAA 0BC1பூ
POO
0BAA 0BC2பெ
PĀ
0BAA 0BC6பே
PAE
0BAA 0BC7பை
PAI
0BAA 0BC8பொ
PO
0BAA 0BCAபோ
POH
0BAA 0BCBபௌ
POU
0BAA 0BCCப்
P
0BAA 0BCDம்
M
0BAE 0BCDம
MA
0BAEமா
MAA
0BAE 0BBEமி
ME
0BAE 0BBFமீ
MEE
0BAE 0BC0மு
MU
0BAE 0BC1மூ
MOO
0BAE 0BC2மெ
MĀ
0BAE 0BC6மே
MAE
0BAE 0BC7மை
MAI
0BAE 0BC8மொ
MO
0BAE 0BCAமோ
MOH
0BAE 0BCBமௌ
MOU
0BAE 0BCCம்
M
0BAE 0BCDய்
Y
0BAF 0BCDய
YA
0BAFயா
YAA
0BAF 0BBEயி
YE
0BAF 0BBFயீ
YEE
0BAF 0BC0யு
YU
0BAF 0BC1யூ
YOO
0BAF 0BC2யெ
YĀ
0BAF 0BC6யே
YAE
0BAF 0BC7யை
YAI
0BAF 0BC8யொ
YO
0BAF 0BCAயோ
YOH
0BAF 0BCBயௌ
YOU
0BAF 0BCCய்
Y
0BAF 0BCDர்
R
0BB0 0BCDர
RA
0BB0ரா
RAA
0BB0 0BBEரி
RE
0BB0 0BBFரீ
REE
0BB0 0BC0ரு
RU
0BB0 0BC1ரூ
ROO
0BB0 0BC2ரெ
RĀ
0BB0 0BC6ரே
RAE
0BB0 0BC7ரை
RAI
0BB0 0BC8ரொ
RO
0BB0 0BCAரோ
ROH
0BB0 0BCBரௌ
ROU
0BB0 0BCCர்
R
0BB0 0BCDல்
L
0BB2 0BCDல
LA
0BB2லா
LAA
0BB2 0BBEலி
LE
0BB2 0BBFலீ
LEE
0BB2 0BC0லு
LU
0BB2 0BC1லூ
LOO
0BB2 0BC2லெ
LĀ
0BB2 0BC6லே
LAE
0BB2 0BC7லை
LAI
0BB2 0BC8லொ
LO
0BB2 0BCAலோ
LOH
0BB2 0BCBலௌ
LOU
0BB2 0BCCல்
L
0BB2 0BCDவ்
V
0BB5 0BCDவ
VA
0BB5வா
VAA
0BB5 0BBEவி
VE
0BB5 0BBFவீ
VEE
0BB5 0BC0வு
VU
0BB5 0BC1வூ
VOO
0BB5 0BC2வெ
VĀ
0BB5 0BC6வே
VAE
0BB5 0BC7வை
VAI
0BB5 0BC8வொ
VO
0BB5 0BCAவோ
VOH
0BB5 0BCBவௌ
VOU
0BB5 0BCCவ்
V
0BB5 0BCDழ்
ZH
0BB4 0BCDழ
ZHA
0BB4ழா
ZHAA
0BB4 0BBEழி
ZHE
0BB4 0BBFழீ
ZHEE
0BB4 0BC0ழு
ZHU
0BB4 0BC1ழூ
ZHOO
0BB4 0BC2ழெ
ZHĀ
0BB4 0BC6ழே
ZHAE
0BB4 0BC7ழை
ZHAI
0BB4 0BC8ழொ
ZHO
0BB4 0BCAழோ
ZHOH
0BB4 0BCBழௌ
ZHOU
0BB4 0BCCழ்
ZH
0BB4 0BCDள்
LL
0BB3 0BCDள
LLA
0BB3ளா
LLAA
0BB3 0BBEளி
LLE
0BB3 0BBFளீ
LLEE
0BB3 0BC0ளு
LLU
0BB3 0BC1ளூ
LLOO
0BB3 0BC2ளெ
LLĀ
0BB3 0BC6ளே
LLAE
0BB3 0BC7ளை
LLAI
0BB3 0BC8ளொ
LLO
0BB3 0BCAளோ
LLOH
0BB3 0BCBளௌ
LLOU
0BB3 0BCCள்
LL
0BB3 0BCDற்
RR
0BB1 0BCDற
RRA
0BB1றா
RRAA
0BB1 0BBEறி
RRE
0BB1 0BBFறீ
RREE
0BB1 0BC0று
RRU
0BB1 0BC1றூ
RROO
0BB1 0BC2றெ
RRĀ
0BB1 0BC6றே
RRAE
0BB1 0BC7றை
RRAI
0BB1 0BC8றொ
RRO
0BB1 0BCAறோ
RROH
0BB1 0BCBறௌ
RROU
0BB1 0BCCற்
RR
0BB1 0BCDன்
N
0BA9 0BCDன
NA
0BA9னா
NAA
0BA9 0BBEனி
NR
0BA9 0BBFனீ
NEE
0BA9 0BC0னு
NU
0BA9 0BC1னூ
NOO
0BA9 0BC2னெ
NĀ
0BA9 0BC6னே
NAE
0BA9 0BC7னை
NAI
0BA9 0BC8னொ
NO
0BA9 0BCAனோ
NOH
0BA9 0BCBனௌ
NOU
0BA9 0BCCன்
N
0BA9 0BCDஃ
AKH
0B83அ
A
0B85ஆ
AA
0B86இ
E
0B87ஈ
EE
0B88உ
U
0B89ஊ
OO
0B8Aஎ
Ā
0B8Eஏ
AE
0B8Fஐ
I
0B90ஒ
O
0B92ஓ
OH
0B93ஔ
OU
0B94ஃ
AKH
0B83
I may talk more about this book from time to time -- like its different take on the ketter count (even with the additional letters it includes outside of the above table). As well as other, smiliar books I picked up covering other languages of India. Or maybe even other books from the pile.
Language is fascinating the living crap out of me at the moment, people....
No Unicode character was unconfused enough to comfortably sponsor this post -- the unfamiliar transliterations and "unfiltered" filtered HTML in Word stunned the lot of them!
# Ben Bryant on 26 Feb 2008 1:41 PM:
All I can say is... wow, you're amazing. That must have taken awhile to make these nice tables.
Most of them involve two code points. What is the correct terminology? "Combining characters", or "combining code points" to make a "combined character?"
# John Cowan on 26 Feb 2008 2:04 PM:
HTML Tidy <http://tidy.sourceforge.net> is very good at cleaning up Word's output: you can get a prebuilt binary for Windows.
# Srikanth on 26 Feb 2008 3:26 PM:
pingback: நன்றி தலைவா!
# Michael S. Kaplan on 26 Feb 2008 6:21 PM:
Hey Ben,
Actually, it was very quick (I scripted it all, then made the mistake of opening it in Word to tweak it!).
Mark Davis likes to call them "grapheme clusters" though I have always preferred the term "user character" since they are what the user thinks of as a character....
# Ben Bryant on 28 Feb 2008 12:36 PM:
Thanks!
# Ben Bryant on 28 Feb 2008 12:39 PM:
oh -- you had the book in some electronic form to begin with, I take it. What script language did you script it with?
# Michael S. Kaplan on 28 Feb 2008 12:49 PM:
Nope, I had the printed book, only. I used Perl to build up the original HTML, opened that in the browser, and then a copy/paste to put it in Word....
# Ben Bryant on 28 Feb 2008 1:54 PM:
I guess I was slow to realize you did not build your table from the book, you generated it from some other source, and said that it was "mostly reproduced" though with swapped axes.
But I'm still missing what that source is. You seem to have indicated you did not type them in so there must have been a source for the list of Tamil grapheme clusters. I'm guessing it is something in one of your previous posts which I missed, right?
# Michael S. Kaplan on 28 Feb 2008 2:28 PM:
I got the data from the book, I did. I essentially looked at the table and typed the code points. But that was all I typed.
What I automated was building the HTML tables, with the code points in them (that is what would have taken all the time!).
# Ben Bryant on 28 Feb 2008 3:39 PM:
Okay I get it, and now I've noticed the patterns in the "0B" code points, most of the first ones are the same in the rows, and the second ones the same in the columns. I gather you also entered the Tamil letters N NA NAA etc, which was probably the bulk of the work.
# Michael S. Kaplan on 28 Feb 2008 4:57 PM:
Exactly. And after running the script the second table made for a pretty cool looking one. :-)
# Leela Mehta on 14 Mar 2008 9:58 PM:
Amazing!!!. I dont know Tamil. I want to learn Tamil. Since I am in chennai from past 3 years but never tried to learn Tamil. From couple I am thinking to lean Tamil. So today I browsed google. I have gone through many sites. I did get any hope from those website. I dont know to speak tamil as well i dont know to write. Never I thought I could get inspired by something where in I will easy.I liked the way you have represented the character. I am finding very beautifully it has been arranged. I salute your brain.
THanks.
raja on 12 Oct 2009 11:28 PM:
hai i would like to lern tamil in 30 days
referenced by
2010/07/26 You can violate the rules of decorum, just not the law of gravity
2009/09/09 On not being in Germany in October
2008/11/03 Inspiration, and a code chart
2008/06/30 Behold the Table Driven Text Service, Part 12 (The knights who say நீ, redux, #2)
2008/03/03 ঘেমন কর্ম তেমন ফল, aka Learn Bengali in a month (or not)