Creation of transliterating input methods

by Michael S. Kaplan, published on 2006/08/18 11:11 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/08/18/706063.aspx


Not too long ago, Thakara asked in the Suggestion Box:

Hi,

I’m working on a Transliterating Input Method for the Sinhala language. One that would allow Sinhala to be entered phonetically. I.e., you would enter ‘ka’ to get KAYANNA (“\u0D9A”), ‘kaa’ to get (“\u0D9A\u0DCF”), ‘kae’ to get (“\u0D9A\u0DD0”) and ‘k’ to get (“\u0D9A\u0DCA”), and so on. And it should work with any (or at least most) existing applications.

The need for this is that the existing layout for Sinhala (Wijesekara) is very hard to use with a non-Sinhala keyboard. I.e., it would require an actual Sinhala keyboard with Sinhala letters printed on the keys. It is very hard to enter Sinhala with, say, a US keyboard. For the relative lack of Sinhala keyboards on the market and to avoid the hassle of having to buy a Sinhala keyboard just to type a few sentences in Sinhala, it is useful to have such a phonetic mechanism. Since this is how we type Sinhala informally (e.g. while chatting), most Sri Lankans are used to such ‘phonetic’ typing.

After some poking around I came to the conclusion that IMM is old hat and the new way is to use the Text Services Framework (TSF) to build input methods. Then, I started looking for a .NET binding for TSF (since it’ll be much easier) but found there was none. Therefore, I started with VC++ 8 (with CLR support) to build my input method, hoping to use .NET facilities for common tasks such as reading/writing XML configs files and the composition window and some GUI elements.

However, working with TSF, I came across many problems. First of all, there seems to be very little documentation about TSF, even on the Internet. The TSF reference cannot even be reached from the Visual Studio 2005 MSDN index. The API seems to be so complex, so obfuscated that it led me to suspect that TSF is a phased-out API.

TSF and .NET does not seem to mix properly as well. I got access violations while trying to load a mixed-code input method DLL in some applications (Notepad.exe) while working fine in others.

The questions I have are these:
*) Is it possible to build a transliterating input method (as I plan to do) with TSF?
*) Is TSF “alive”? Has it been phased-out/deprecated in favor of something else?
*) Is mixing .NET with TSF bad? Do I have to work in pure C++ (*pain!)

I would be very glad if you could shed some light on these questions, so that I can be sure I’m not on a wild goose chase with TSF.

Thanks!

Tharaka

The Text Services Framework is definitely alive and well -- in fact, in Vista virtually all of the Input Mehod Editors (IMEs) have been converted to use it, and the input methods for Yi and Amharic both use it as well.

Unfortunately, I do not know of any specific way to allow for a managed (.NET) TSF Text Input Processor. I will inquire further but I suspect that this is not possible given how it has to be integrated into essentially any thread using it for input, whether managed or not.

But the good news is that such a transliterating input method is quite possible with the Text Services Framework. And companies like Murasu have actually created such input methods for Tamil and other languages already, when simple keyboards are simply inadequate. This is the model for the input methods used for Amharic in Vista, for example.

It is even quite easy in Vista using the same techniques I used to create the Cantonese and Unicode IME samples I have been working on. If you wanted to send me the table containing all of the equivalances you are using, i.e.

"ka" = "\u0D9A"
"kaa" = "\u0D9A\u0DCF"
"kae" = "\u0D9A\u0DD0"
"k" = "\u0D9A\u0DCA"

and so on, I'll see if I can add another sample to the list....

This method does not currently work in versions of Windows prior to Vista, although to be honest the font and shaping support for Sinhala is also not widely available (other than the earlier version that was released as described here, and significant enhancements to the font and shaping engine have happened since then).

Some form of an input method like this, if it gains wider acceptance in the community and by language experts, could eventually find itself considered for inclusion in a future version of Windows!

So Thakara, if you can just send me your email contact info via that Contacting Michael... link, we can talk further about how to get the info transferred and get the sample put together!

 

This post brought to you by (U+0d90, a.k.a. SINHALA LETTER ILUUYANNA)


# Marc Durdin on 18 Aug 2006 5:03 PM:

You could also look at Tavultesoft Keyman which allows fully contextual input and supports Text Services Framework, as well as standard Unicode controls and has a full development kit - www.tavultesoft.com

There are a number of transliterating keyboards available for Keyman already.

# NyaRuRu (MSMVP for Windows - DirectX) on 23 Aug 2006 2:41 PM:

I developed a pure .NET Text Input Service for Japanese language 4 years ago. It's very exciting experience for me.

But there is a problem that managed Text Service DLLs should be introduced to the target process, which may host another version of CLR.
Because current CLR can not run as side-by-side in the same process, the process will be crash.

I wish we could develop useful Text Services by .NET language without such a limitation.

# Michael S. Kaplan on 23 Aug 2006 4:52 PM:

It should not actually crash, though. You do have to make sure you can load in any version of the BCL and not use methods that are only in newer versions, but you should be able to load in any process here?

# NyaRuRu (MSMVP for Windows - DirectX) on 23 Aug 2006 8:14 PM:

Thanks to Michael,
Yes, I agree with you that we can run a (pure .NET) DLL under the different version of CLR, but I think it is not so easy and safe enough.

I found a similar thread :
http://www.hightechtalks.com/dotnet-framework-interop/net-2-0-mscoree-dll-209205.html

# Michael S. Kaplan on 23 Aug 2006 10:31 PM:

Well, that issue is explained in the thread:

"If the third part application is a native COM client, the latest installed
version of the CLR and the framework will be loaded, this is by design.
This is done to guarantee that v2 add-in's as well as older version add-in's
can load and run in the same process."

But as long as your code will run in the latest version, it should be okay....

# NyaRuRu (MSMVP for Windows - DirectX) on 24 Aug 2006 1:08 AM:

Thanks.

Anyway, I cannot use generics, C# 2.0 iterators and anonymous delegates to make my text service if I want to use the managed text service in Visual Stuio .NET 2003 into which .NET 1.1 CLR  has been loaded.

Fortunately, my text service was written in C# 1.0, so it will be work fine both Visual Stuio .NET 2003 and Visual Stuio 2005.

referenced by

2008/07/25 Behold the Table Driven Text Service, Part 13 (Sinhalification proclamation!)

2008/06/21 Back to Sri Lanka (conceptually)

2008/01/21 Behold the Table Driven Text Service, Part 0 (You have to start somewhere!)

2006/09/17 And we are the knights who say நீ (NII)

2006/08/26 I must admit that an example would be nice

go to newer or older post, or back to index or month or day