Learning to spell in Bengali (when one has a cool input method)

by Michael S. Kaplan, published on 2010/03/23 07:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2010/03/23/9982461.aspx

The other day, JC Ahangama mentioned in the Suggestion Box:

I suggest the topic "Romanizing paired with Orthographic fonts route for Indic".

I have successfully made a round trip conversion between romanized Sinhala (inside ISO-8859-1) and Unicode Sinhala.

It allows intuitive phonetic data entry that gets displayed in the native script if you have an orthographic font for the language created using Open Type 'ligq' features. Otherwise, it gracefully falls back to Latin-1, which is readable too.

Collation happens easily (I extended JavaScript sort() for it). This is a much simpler solution than the struggle we are going through with double-byte Indic.

For the record, I am completely, entirely, and unreservedly against this. UNICODE remains the answer, not new pseudo-solutions, be they new code pages, font hacks, or any other such item.

But I thought I'd answer the question in a slightly better way, one that provides a solution to the issue that does not rely on such parlor tricks or special cases.

And with that in mind....

Recently Scott Hanselman pointed me at something that I thought was pretty interesting (another Microsoft employee pointed it out to him, if memory serves).

It was a tool for inputting Indic text.

Now I have done my fair share of random work in various languages of India like Tamil, Hindi, and Bengali.

And I've been getting a fair amount of mail from people who wanted to tell me about their exciting input method that would change everything (and usually they proceeded to point how cheaply Microsoft could buy it!).

Often not very exciting for me since usually they are not that exciting.

But this time was going to be much cooler.

This time it was the Microsoft Indic Language Input Tool!

Now this thing is incredibly cool in both its web-based and desktop versions, it really is.

In fact, it caused me to wonder how Goldie was doing in her new job....

Because I took her name -- the one that was painstakingly spelled in Learning to spell in Bengali (when one doesn't know the language) -- and simply typed it in English using the English transliteration that she had known her whole life (Goldhuli Chaudhuri).

I typed it in straight, no pauses, and after I saw what happened I did it again taking screenshots as I went:

And that's right....

গোধূলি চৌধুরী

without needing to select an alternate candidate or even select the initial one (when I hit the space it committed the word). Typing was as fast as typing it in English.

Well duh! I was typing English. :-)

I have to say, on a more personal note:

I must admit that I am on the whole happy that this tool wasn't available when the events described in Learning to spell in Bengali (when one doesn't know the language) happened; had it been, it might well have fundamentally altered the nature of our relationship at the time, what with that whole initial little story involving Tagore and poetry and music and us and her parents and Bengali and a big email thread and lots of IMs. Had none of that happened, a-la- screwing up the Back to the Future kiss between Lea Thompson and Crispin Glover? The horror!

Well I'm not sure, but I imagine that no props would have been given to me if it were typing two words and we were done....

But I also have to say, on a more professional note:

I am very glad the Microsoft Indic Language Input Tool exists now.

My other experiments with it (e.g. random words and phrases I know in Bengali and Tamil) were also impressive, as was my (re)creation of the experience with Hindi and How would Harry Potter have pronounced शहिवाख़्‍ का दर्पण, anyway?. And I was once again very impressed.

I admit the one place the tool failed was in creating that unpronounceable word, but considering how amazing of a job it did at words that do exist I'm willing to live with its trouble on the one that doesn't!

So check out the tool, check out their blog, and keep your eyes on these folks.

Indic input? It just got easier for a whole lotta people.

And unless I am misunderstanding the registry keys the tool added, it is using the Text Services Framework, which makes it very "Windows friendly" to my way of thinking.

Is someone working on getting this stuff in the box some day? :-)

Kathleen Borowski on 23 Mar 2010 12:49 PM:

There is one for Arabic as well out of Microsoft Egypt - Microsoft Maren see http://www.microsoft.com/middleeast/egypt/cmic/maren/

Too late for me as I learned to type Arabic years ago but useful for others learning or simply wanting a shortcut.

Michael S. Kaplan on 23 Mar 2010 1:56 PM:

Yep, I'll be talking about Maren soon (it is also on my list of future topics!)....

Godhuli Chaudhuri on 23 Mar 2010 5:03 PM:

Talk about a blast from the past. My new manager actually forwarded this post to me :) I think the interesting thing about the input tool is that it updates "real-time" as you add letters. C -> Ko, Ch -> Cho, Cha -> Cha, Chau -> Chow (I ought to pull out an accented long O but that seemed a bit much for a blog comment). For vowels especially, that adaptation is not an easy thing to generalize.

Hope you're doing well - drop me a line sometime.

Tom Gewecke on 23 Mar 2010 8:17 PM:

The web version seems to work fine with my Mac browsers -- not as nice as the desktop one but still very cool.

Suraj on 23 Mar 2010 9:03 PM:

Google also provides a similar tool (http://www.google.com/ime/transliteration/) which I found to be better for Marathi input. Perhaps you should have a look?

Michael S. Kaplan on 23 Mar 2010 9:05 PM:

I lack the skill to judge Marathi quality, so I'll have to take your word for it. :-)

Pavanaja U B on 23 Mar 2010 9:37 PM:

Roman transliteration input for Indic is nothing new. There are many tools in existence since ages. Baraha (www.baraha.com) is one such tool. MS also had a similar tool about 5 years ago for beta test. I did test it. Then all of a sudden, it's development was stopped. Now, MS releases such a tool after Google releases it's Indic IME. BTW, Google's Indic IME does not work in 64 bit versions of Windows whereas MS's works :)


aleena on 27 Sep 2011 3:11 AM:

i already know bangla but i dont know how to spell it in english

