Tamil is an abugida. It is not an abugi-DOH!, if you know what I mean

(apololgies for the Homer Simpson Doh! reference!)

The term Abugida, first coined by Peter T. Daniels (co-author of The World's Writing Systems, a book that our team could almost certainly benefit by having its own copy of -- hint hint to people buying books!), is meant to describe something that is simple enough conceptually (from Wikipedia):

...a writing system composed of signs (graphemes) denoting consonants with an inherent following vowel, which are consistently modified with extensions or diacritical marks to indicate other vowels, or, in some cases, the lack of a vowel.

The Tamil script is an Abugida. The concept of the inherent vowel built into the consonant is a definite part of how the script works, as is the use of vowels or the use of a puLLi (virama) to surpress the vowel.

One could argue that this is more true of Tamil than some of the other Indic scripts, since it always has a visible puLLi, whereas some of those other Indic scripts do not. So you can really think of Tamil as an über-abugida, if you think about it.

Now all of this has nothing whatsoever to do with computers. This is how the Tamil script has worked for millenia (how many millenia depends on which of two theories you subscribe to). Let's take a moment to think about that fact -- about how nothing that has been done in recent years has any meaning here that compares to a tradition that has existed for more generations than any of us really have to draw on usually.

Let's look at the sequence of apply vowels (or not doing so) on a letter:

k   க்
ka  க
kaa கா
ki  கி
kii கீ
ku  கு
kuu கூ
ke  கெ
kee கே
kai கை
ko  கொ
koo கோ
kau கௌ

Now if you look at each of these items above, they clearly all have that "K"-ish thing in there somewhere. Even in cases where the vowel is actually in front of the k (like in ke) or in front of and behind the k, like in ko or koo or kau. Clearly one expects கே to be pronounced as kee and not eek for the simple reason that it is not called that, not to mention the fact that this is not a 60's sitcom with as housewife on a chair avoiding a mouse.

And now let us apply this all to Tamil in technology for a moment.

It is quite reasonable when designing a typewriter that is meant for Tamil to do one of two things:

  1. Split apart the pieces so everything can be typed in a visual order, or
  2. Put in individual keys for every possible combination

These are really the only solutions in a typewriter that will allow one the flexibility to deal with the fact that each of these vowel combinations that precede the consonant can actually have a different width. And of course given the fact that there are 18 consonants to go with those 12 vowels and you have a pretty huge freaking number of characters which means that really only option #1 is an option if you want to avoid an over 230-key keyboard including punctuation and numbers.

Of course the bad part about option #1 is that as a person who knows and loves the language and wishes to type with it, you might well need to be able to keep in mind the lie of the keyboard that forces you to type eek when you mean kee.

Eventually you will get quite good at it, certainly. But that is not about the language. And it is not about the script. It is about dealing with the limitations of the technology. About making the writing different from the reading.

Now I should pause to point put that this is not necessarily new to typewriters.

I imagine handwriting with Tamil is unlikely to find everyone leaving a space, drawing a k, and then going back to draw the ee when one is writing kee. Though perhaps proper penmanship would ask this of the person hand writing Tamil? Even if so I am sure there are many who do not do it, even as there are many who write Chinese ideographs without using the official standard for stroke order in every case.

What typewriters do here is formalize a process that is not a part of the script or of the language -- if anything it is formalizing a quite informal process of making the way one writes different than the way one reads.

And now let us add computers to the mix.

The most likely people to want a solution for computers are indeed the people who have typewriters and are upgrading. And that puts in a strange place. A very strange place, where one can keep moving on the same road and building up this model of writing that does not match the reading, or where one can work on preserving the language, instead.

After talking to and working with native speakers for the last few years, I am an unapologetic fan of the movement to preserve the language in its purest form.

Perhaps if Tamil were an uglier script, I might feel differently. I know that some people feel that just as there are no ugly babies there are no ugly scripts, but I don't believe that. The beauty of text flowing in a manuscript is something that not every script can claim; some actually need fancy calligraphic styles to get such an effect, whereas others can do it in their simplest form. To my eyes, Tamil is one such script, and one such language.

Now Tamil is a complex script.

It is not as complex as Devanagari, or Sinhalese, or Bengali, or Tibetan, to be sure. It is in fact one of the simplest of the complex scripts, even when you allow for the extensions to Tamil known as Tamil Grantha that have been used for Sanskrit for millenia. But it is still a complex and beautiful script that deserves the opportunity that technology can afford it to return to its purer linguistic roots.

And that is why I do not support the Homer Simpson-esque mechanism that is trying to convert the Tamil abugida to a Tamil abugi-DOH! Because it is a script and a language that deserves better here....

Think of this post as the first of several that will talk about such issues related to Tamil.


This post brought to you by(U+0b95, a.k.a. TAMIL LETTER KA)

# Ambarish Sridharanarayanan on Sunday, September 17, 2006 4:28 AM:

I suspect this has some relevance to TUNE, but I'm not exactly sure what. Or maybe you're making a different point altogether? Inquiring minds want to know.

P.S. Yeah, I am a native speaker of Tamil.

# Michael S. Kaplan on Sunday, September 17, 2006 8:08 AM:

Hello Ambarish,

Well, not just yet. I am trying to build up a framework with my overall understanding of the issues, for several reasons:

1) so that if my understanding has flaws in it they can be corrected
2) so that if my understanding does not have flaws in it (or once it is so) I can have at a minimum the credibilty that they provide me
3) to help others who don't know the situation understand it

I originally got involved with Tamil because I saw some brilliant people in a community that were having a little trouble intergrating their opinions with Unicode and I wanted to help correct that mismatch if I could. And although I believe I have been succcessful on some fronts, I have been wildly unsuccessful at helping to integrate those forces in Tamil Nadu who have convinced themselves that they need a new 16-bit standard. I want to be able to communicate with these people, even if in the end we still disagree, at the very least both of us will understand better the points each is making....

The big question for you (and all native speakers of Tamil who are reading this post) is -- how is my understanding so far? Any flaws in the reasoning yet? :-)

# vishnu vyas on Sunday, September 17, 2006 1:18 PM:


I don't see whats wrong with solution 1.

For example, when some one wants to type kee, they type the vowel mark for kee (in tamil called the kokki) and then the consonant for ka.
People won't see that as reading it as 'eek'.. Because from a native tamil speakers point of view, the mark serves to modify the following consonant and thus it doesn't have any sound of its own. there is no jumping through the mental hoops business here.

On the other hand, if they really want to spell out "eek", they can use the the tamil vowel (aee) followed by k and the pulli. (in my opinion the pulli should be something that should be accessible by pressing shift or something).

I guess I have been clear ...
(I'm a native tamil speaker too..)

# Michael S. Kaplan on Sunday, September 17, 2006 2:14 PM:

I'm not saying there is anything wrong with it, at all (it is probably how things are often written and typed, as I said!).

But it does mean that what people read and what they write are two different things. Basically, it is the difference between logical order and visual order.

There are huge benefits to logical ordering, whether one considers collation or many other kinds of text processing. and certainly when it comes to input, everyone I have talked to so far considers it more natural to type k - e - e to get kee rather than having to type e - e - k to get kee.

I will actually be talking more about input in my next post. :-)

# RubenP on Sunday, September 17, 2006 4:56 PM:

I'm not sure the argument of meaning 'k-e-e' rather than 'e-e-k' is a valid one. I might not speak or write Tamil, but typing the characters you want to see on the screen in visual order, seems a lot simpler than forcing linguistic order, and then changing it back to visual order again. It's probably quite unlike writing the language too, which will probably confuse the heck out of people that haven't given writing as much thought as liguists have. (That's my guess, of course.)

(There are some very nice examples of liguists trying to make the spelling of some Latin-based languages more linguistically sound, and end up making life hell for the average user of these languages. Don't ever let linguists and politicians take your language for a test drive!)

# Michael S. Kaplan on Sunday, September 17, 2006 5:10 PM:

Well, see that next post about input methods for Tamil, Ruben. :-)

# Phylyp on Monday, September 18, 2006 3:19 AM:

>Clearly one expects கே to be pronounced as kee and not eek for the simple reason that it is not called that, not to mention the fact that this is not a 60's sitcom with as housewife on a chair avoiding a mouse.

LOL :)

Regarding 'beautiful' scripts, I like the Tamil, Malayalam and Kannada scripts, though I can only read the first one.

Yeah, I'm a native Tamil speaker - amazing how many of us come out of the woodwork when you start talking about Tamil, eh?

# Phylyp on Monday, September 18, 2006 3:24 AM:

And for the non-Tamil speakers, I'd like to point out that கே which is indicated as 'kee', sounds like the 'kay' in Kate, not like the 'kee' in Key.

# Michael S. Kaplan on Monday, September 18, 2006 4:14 AM:

Hi Phylyp,

I actually knew about the pronunciation thing (and did it right for the second post), I was just worried that the joke wouldn't work without the visual of the woman saying Eeek! :-)

I think it is great that people are leaving comments and providing feedback, especially on the input method but really any feedback is good....

# Sridhar on Tuesday, September 19, 2006 4:24 AM:

>The concept of the inherent vowel built into the consonant is a definite part of how the script works, as is the use of vowels or the use of a puLLi (virama) to surpress the vowel.

க் and ஈ are basic characters and கீ is a derived one.  It is probably incorrect to say a puLLi suppresses the vowel.  

There is at least one IME that takes care of translating k+e+e to கீ.    This is more natural than the typewriter way of compoing the letter.   It is also in line with what we learn in school as  க் + ஈ = கீ.  So, when people migrate from typewriters to key boards, they may need to relearn.  When IME are designed for Tamil, they offer multiple key board layouts as options.

I am not fully into the discussions on Unicode for Tamil, but I know a number of Tamil scholars are unhappy with the encoding.  I hope your article takes us to a more acceptable encoding.

