Behold the Table Driven Text Service, Part 11 (The knights who say நீ, redux, #1)

by Michael S. Kaplan, published on 2008/02/25 10:31 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2008/02/25/7884260.aspx


Prior posts in the series:

Okay, we have now gone through a bunch of information on the Table Driven Text Service component and the text files that define the identity and behavior of individual Text Profiles.

So what happens next, exactly?

Well, let's start with the Text Profile I discussed, demo'd, and did not yet give to anyone in And we are the knights who say நீ (NII).

The framework I am using is the same a in that post, plus the feedback in the comments. Like the following for each consonant:

"n"   = "ந்"
"na"  = "
"
"naa" = "
நா
"
"ni"  = "
நி
"
"nii" = "
நீ
"
"nu"  = "
நு
"
"nuu" = "
நூ
"
"ne"  = "
நெ
"
"nee" = "
நே
"
"nai" = "
நை
"
"no"  = "
நொ
"
"noo" = "
நோ
"
"nau" = "
நௌ"

And then the following pure (independent) vowels:

a    அ
aa   ஆ
i    இ
ii    ஈ
u    உ
uu   ஊ
e    எ
ee   ஏ
ai    ஐ
o    ஒ
oo   ஓ
au   ஔ

And the following consonants:

k    க்
ng   ங்
c    ச்
j    ஜ்
ny   ஞ்
tt    ட்
d    ட்
nnn  ண்
th    த்
n    ந்
nn  ன்
p    ப்
f    ஃப்
m   ம்
y   ய்
r    ர்
rr   ற்
l    ல்
L    ள்
zh   ழ்
v    வ்
ss   ஷ்
s    ஸ்
sh   ஶ்
h    ஹ்

But as others have pointed out, this is kind of tedious -- there are some many combinations that really should be handled by using different cases rather than requiring a person to type two vowels.

Now this is currently a limitation in TableTextService.DLL but it may nlot always be -- some future version my address the limitation.

In fact if you look at the Amharic input method in Vista and its text file, you'll see that it mixes upper and lower case on the input side, in anticipation of that limitation being addressed at some point. In the meantime , when you have multiple entries with the same letter differing oinly by case, they will simply both show up in the candidate list.

So what is the principle here, native Tamil speakers? Taking the above lists, which ones would you change the left side entries with, and how?

When I get back all of the rest of the feedback, we'll replace my "based on Unicode character names" input method with one that will perhaps be a bit closer to intuitive!

Then we'll configure the various settings and produce our ideal Tamil input method....

So, any native speakers want to chime in on their replacements? I have tried to do the ones that others suggested in the comments of that post, but I'd like to get them all done -- concentrate on the second and third lists above, noting how

l    ல்
ll    ள்

became

l    ல்
L    ள்

and going from there....

Now of course this is kind of a transliteration keyboard, but it does not have to be if there are keyboards that print just Tami letters on them and we wanted to have this input method match it. Does anyone have such a keyboard? And if so could they take a picture of it?

 

This post brought to you by நீ (U+0ba8 U+0bc0, a.k.a. TAMIL LETTER NA + TAMIL VOWEL SIGN II)


# Ambarish Sridharanarayanan on 25 Feb 2008 6:26 PM:

Well, Tamil actually is a lot more strict in it's consonant clusters - only a small subset of the 18*18 combinations are actually allowed. For instance, when I type in nka, I would like ங்க, since before க, none of the other nasals are allowed.

Of course, this would be domain-specific and it would limit the IME to generating only legal code-point runs, not everything allowable by the script, but I'd argue IME need to know the language well anyway.

# Michael S. Kaplan on 25 Feb 2008 6:34 PM:

That can be solved by making those letters the actual output for those keystrokes --- though of course we'd need them to be given.

Would you like to try defining a few of those? :-)

All I really need is:

  1. The corrected letters for the input, and
  2. Exceptions like the one you mention

And then everything else will be done and ready to download/run on Vista and Server 2008....

# Ambarish Sridharanarayanan on 26 Feb 2008 2:37 PM:

Sure, I'd love to help. It's essentially a context-sensitive grammar. It's probably easier to lay out the correct options (exceptions as you term them) than to lay out all the options and rule out the illegal ones.

# Michael S. Kaplan on 26 Feb 2008 7:49 PM:

That would be wonderful. :-)

Of course this is purely table based so none of the more complex like start of word, not at start of word etc. type stuff can be captured...

# hiligy on 14 Mar 2008 10:02 PM:

Amazing!!!. I dont know Tamil. I want to learn Tamil. Since I am in chennai from past 3 years but never tried to learn Tamil. From couple I am thinking to lean Tamil. So today I browsed google. I have gone through many sites. I did get any hope from those website. I dont know to speak tamil as well i dont know to write. Never I thought I could get inspired by something where in I will easy.I liked the way you have represented the character. I am finding very beautifully it has been arranged. I salute your brain.

THanks.


referenced by

2008/10/21 Behond the Table Driven Text Service, Part 14 (Don't expect too much from numbers)

2008/07/25 Behold the Table Driven Text Service, Part 13 (Sinhalification proclamation!)

2008/06/30 Behold the Table Driven Text Service, Part 12 (The knights who say நீ, redux, #2)

2008/06/21 Back to Sri Lanka (conceptually)

2008/02/26 Learn Tamil in 30 Days (or something like that)

go to newer or older post, or back to index or month or day