And we are the knights who say நீ (NII)

by Michael S. Kaplan, published on 2006/09/17 17:07 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/09/17/759687.aspx


I talked before about how Tamil is an abugida. I thought I'd talk for a bit about input methods, since once you know what the language is and now that I have suggested that what the language is should have a bigger role in how it is typed than how the language is encoded.

I'll start this post by proving how bad not doing as I suggest can be; I'll point out the current Tamil keyboard on Windows, the one that has been there since Windows 2000, in the base, shift, altgr, and ctrl+shift states:

If you are a native speaker, this might seem unfamiliar. And definitely not QWERTY phonetic. Like if you look at the N key (VK_N) you find ல, U+0bb2, TAMIL LETTER LA:

and if you look at the V key (VK_V), you'll find ந, U+0ba8, TAMIL LETTER NA:

Not really a keyboard that builds an intuitive user experience, is it?

Now sure, you could try and build a more intuitive keyboard with MSKLC, but there is s lot working against you there -- such as the fact that there isn't really a dynamic way to affect the text when you type multiple keystrokes.

And besides, if you really want that kind of keyboard you can build it right now with MSKLC, and without me. :-)

But how about if I try to build something using the same stuff I used for the Unicode and Cantonese IMEs.

This one is not yet ready for prime time (neither is the Sinhala one I am working on with the help of Thakara I mentioned before), but I can show what I have in mind so people can poke holes in the idea (or if they like it they can say that, too!).

Here it is before anything is typed:

Let's try typing the letter N, you know, that VK_N we were talking about before.

Interesting... a nice list of candidates, as well as a bit of input that suggests how additional letters can narrow the field. Plusd, you notice how that first item has no letters after the character? That's what I will get if I hit ENTER (or 1, or choose the first item) right then:

What do you know ந் (a.k.a. U+0ba8 U+0bcd) -- TAMIL LETTER NA plus TAMIL SIGN VIRAMA, or more conventionally, NA + puLLi. Which is basically a letter N, since the inherent vowel is being dropped.

Ok, let's try a bit more. Let's try typing N followed by I. What happens then?

Well, that thins the field out a bit, huh? I can just hit ENTER (or 1, or choose the first item) and be like the Tamil version of the knights who say நி (NI):

. Or I can type NII (or 2, or choose the second item earlier, and since there is only one item it will just commit it automatically:

Now the fact that the input method gives that candidate list and those hints as to what typing will do to change the list can be very helpful in learning the layout. The data behind the entries with TAMIL LETTER NA is simple enough:

"n"   = "ந்"
"na"  = "
"
"naa" = "
நா"
"ni"  = "
நி"
"nii" = "
நீ"
"nu"  = "
நு"
"nuu" = "
நூ"
"ne"  = "
நெ"
"nee" = "
நே"
"nai" = "
நை"
"no"  = "
நொ"
"noo" = "
நோ"
"nau" = "
நௌ"

and the input method just kind of extends that to all of the consonants in the language (and adds a few of those Grantha that are in Unicode now).

Of course this is working entirely off of using the Unicode names for all the letters in the Tamil block as pure transliterations and building an input method from them as is (with some minor additions based on what people have told me in the past), which will not be 100% of what a native speaker might expect. Here is what I am working off of right now

a    அ
aa   ஆ
i    இ
ii    ஈ
u    உ
uu   ஊ
e    எ
ee   ஏ
ai    ஐ
o    ஒ
oo   ஓ
au   ஔ
k    க்
ng   ங்
c    ச்
j    ஜ்
ny   ஞ்
tt    ட்
nn   ண்
t    த்
n    ந்
nnn  ன்
p    ப்
f    ஃப்
m   ம்
y   ய்
r    ர்
rr   ற்
l    ல்
ll    ள்
lll    ழ்
v    வ்
ss   ஷ்
s    ஸ்
sh   ஶ்
h    ஹ்

Feel free to suggest anything that might be wrong here (is you do not see the second last entry SHA + puLLi then you might not be running on Vista!).

Now this kind of a phonetic input system seems more intuitive, both it what initially gets typed and in the feedback you get while you are typing. It certainly feels (to me) like a much more natural system than typing I I N to get NII.

Though I am not a native speaker so my opinion is not the most important type of opinion here, of course. :-)

Which is not to say a visual input system would be impossible -- one could simply swap around the letters and create one based on the visual aspects, too. I am not sure whether or not that would be useful, so I'll again ask native Tamil speakers what they think....

And I'll try to make this sample and the Sinhala one available soon! :-)

 

This post brought to you by நீ (U+0ba8 U+0bc0, a.k.a. TAMIL LETTER NA + TAMIL VOWEL SIGN II)


# vishnu vyas on 17 Sep 2006 7:48 PM:

one thing that I noticed is that you use 3 n's to write the softer na while you use only 2 n's to indicate the harder na. Is that a typo?

And secondly how do i get the "aauta ellluthu" (the symbol that is comes before the p in the sound f, the 3 dots on a triangle character). Though it is not used in regular discourse, it has its uses in peotic tamil.

Also, I find this phonetic typing a bit tedious. (maybe thats just me.. and I am a bit inclined towards a completely visual representation).


# Prathiraj on 17 Sep 2006 8:03 PM:

As a native speaker of Tamil, I, and am sure many of the native speakers are used to "zh" for "ழ்" which is a unique character in Tamil, as we native speakers liked to call as Tamizh :-)

BTW are you not doing the same work that has been done in the Microsoft Phonetic Input tool [http://www.bhashaindia.com/phonetictool/]? which includes all the Indian languages including Tamizh.

# Michael S. Kaplan on 17 Sep 2006 8:19 PM:

Hello Vishnu,

Actually like I said I am literally using the Unicode character names -- that is U+0ba9, TAMIL LETTER NNNA that guided me. :-)

I would add that aytham to the keyboard on its own, definitely. I was mainly focusing on the consonants and vowels for this post, that's all.

Of course where to put it is an interesting issue to decide. Any thoughts? :-)

# Michael S. Kaplan on 17 Sep 2006 8:22 PM:

One more thing, Vishnu --

I am not sure I understand what you mean about it being tedious -- it would be the same number of keystrokes either way, only the order for the letters would be different. From the standpoint of the visual layout it would also be more jarring since it would move from the standalone vowels to the dependent vowel before the consonant as you typed, which seems even more fatiguing, to me....

# Michael S. Kaplan on 17 Sep 2006 8:26 PM:

Hello Prathiraj,

I hear you about the "zh" thing -- this is just the kind of feedback I am looking for (using only the character names was bound to have differences!).

The thing I am doing here is not a tool, it is a sample of an input method -- basically a prototype in progress of a better Tamil keyboard, being built on specific principles. Obviously such an effort will have differences from the MS Phonetic Input Tool due to the different approaches and of course different sources and different levels of expertise....

# vishnu vyas on 18 Sep 2006 2:19 AM:

Hmm.. about the unicode.. I find it weird that the lighter na has more 'n's than the heavier (harder) na. Logically it should be the other way round isn't it?

And secondly why not use capital/small english alphabets to represent soft vs hard sounds? like l for la and L for lla?

And about phonetic input being tedius.. I should have probably used the word - unnatural. I will explain what I am saying.
For example consider the tamil word for house - veedu.

In a phonetic input system I would have to type "v-i-i-t-t-u" (according to what you have specified) even though veedu is more natural. On the other hand if it was based purely on visual representation.. I would have typed v-the sign for ii-t-the sign for u.

Strictly speaking there are only 43 distinct symbols in tamil (including consonants, vowels, the vowel marks and the pulli). So if we can use shift in a consistent manner (harder/longer sounds) we can easily make it within the normal keyboard. Infact, we can take into acount the simple rule in tamil that explicit vowel letters are only used in the beginning we can simplfy it further more..

(when i say the sign I just mean the markings and not the vowels themselves). Infact you can conserve space on the keyboard by using shift to represent smaller/longer (for vowels) or harder/softer sounds  (for consonants) on the same key. that would make it 7 +

Also, tamil has certain rules that are almost never violated and can improve the intellisense thing..

1. The na corresponding to a single n occurs exclusively at the begining of words or when it occurs in the middle, it is allways the pure consonant form (n+pulli, pronounced as indh).

2. The other two na's never occur at the beginning of words at all.
(there are rules like these for other characters too..these are the ones that I remember).

The other things I can suggest to the phonetic input you have give is

1. accept both d/t for tt. (ta).
2. use th instead of t (its tha, not ta).

# Michael S. Kaplan on 18 Sep 2006 2:35 AM:

Well, character names are what they are, I can't really change them (no one can). But they do not have to be used for the keyboard, they were just a starting point for me.

But I cannot use small/capital letters as the input method itself does not distinguish between them....

Also, it does not have knowledge of word boundries -- it is a simple table based system.

But I will incorporate the other feedback you have here. :-)

# vishnu vyas on 18 Sep 2006 2:43 AM:

> But I cannot use small/capital letters as the input method itself does not distinguish between them....

does that mean the input method can't sense the difference between a single keypress and shift+single key? (I'm not an expert on input methods.. so just asking).

# Michael S. Kaplan on 18 Sep 2006 3:27 AM:

This particular framework I am using cannot currently tell the difference, no.

# RubenP on 18 Sep 2006 3:27 PM:

OK, my previous argument about visual order vs logical order can be ignored, when we're talking about an IME based on the Latin script. I was thinking about a 'real' Tamil keyboard with actual Tamil characters on the keys. In *that* case my argument still stands, I think. (And it looks like it actually works that way too.)

Also, about the Tamil characters not lining up with QUERTY; have you ever heard of AZERTY or Dvorak? ;-) These layouts don't line up with QUERTY either. So the qualification regarding the 'intuitiveness' is a little US-centric.

# Bala on 19 Sep 2006 12:43 AM:

You can look at www.azhagi.com. The software has a very easy to use tamil keyboard transliteration. There is a free version for you to download and try...

# Michael S. Kaplan on 19 Sep 2006 2:49 AM:

Interesting.... though it is unfortunate to be spammed by subsequent comments with the link. Let's just agree not to head in that direction, okay?

You know, since the blog name is not "Billboarding It All Out" ? :-)

# Bala on 20 Sep 2006 3:25 AM:

Well, I am a native speaker of Tamil and have tried a lot of tamil tools including the Microsoft provided Indic IME (http://bhashaindia.com/downloadsV2/Category.aspx?ID=1). IMHO, By far azhagi has the best transliteration because the author spent a long time tuning it. I meant no spamming, seriously.

Further, http://quillpad.in/tamil/ provides a interface similar to the tool we are talking about - providing alternate dictionary and algorithm based word selection.

# Michael S. Kaplan on 20 Sep 2006 9:41 AM:

Hello Bala,

What I am trying to build with these posts is not a commercial for the best tool. I am actually trying to help build understanding for concepts within a language. So now that you have pointed to some sites we can probably move back to the actual topic, which is not trying to define the best transliteration or really the best anything.

Let's move on now, okay? (and please don't repond with another link!<grin>)

referenced by

2008/06/30 Behold the Table Driven Text Service, Part 12 (The knights who say நீ, redux, #2)

2008/06/21 Back to Sri Lanka (conceptually)

2008/02/25 Behold the Table Driven Text Service, Part 11 (The knights who say நீ, redux, #1)

2008/01/21 Behold the Table Driven Text Service, Part 0 (You have to start somewhere!)

2008/01/16 Heading over to India, for a bit

2007/05/13 Keyboards that map any language to any other language, or the lack thereof

2006/11/04 On Thokks who don't give a Frigg, under the mistletoe

go to newer or older post, or back to index or month or day