Keyboards that map any language to any other language, or the lack thereof

by Michael S. Kaplan, published on 2007/05/13 04:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2007/05/13/2590479.aspx


A couple of days ago, in response to Language support for non-native speakers, Bertilo Wennergren commented:

So what about people used to a Cyrillic keyboard wanting to type Chinese or Hindi or Arabic? Or people used to an Arabic keyboard wanting to type in Bulgarian? Or people used to a Burmese keyboard wanting to type in English? You get my drift?  The number of various keyboard layouts will be enormous if you take this idea to its logical conclusion.

This is indeed where the idea takes itself.

So let's get taken in by it for a bit. :-)

Sometimes you will be talking to someone in your native language and it will be clear to you that it is not their native language. Because they will speak it with an accent.

What is really happening is that they are bringing with them knowledge of their own language and its 20-40 phonemes and trying to speak your language with its own (not entirely the same) 20-40 phonemes, and mapping the two as best they can. How well they do it largely depends on how close they are, though with the ~300-500 different phonemes out there (depending on who you ask!), there is a lot of room for confusion and error (and accents so thick that you can't always tell what they are saying!).

Now the keyboard mapping issue, and to be honest the very related cross-language transliteration issue,  can be honestly thought of as the orthographic equivalent of the phonetic (well, phonemic) mapping that is implied above.

A language like English is actually a very natural one to consider for the effort, for several reasons:

Though Bertilo is right that it is a bit provincial to assume that such a solution will work for everyone, it is the one that will probably be most likely to have some mappings available.

And there is definitely no good way to guess at appropriate conventions here, even with data like Unicode names (for example look at my first cut of a Tamil->English mapping I gave here that people in comments pointed out could benefit from using mappings that are more widely accepted in user communities).

For the time being we will ignore the fact that different language communities for a single language can each have their own mapping choices, and those choices can conflict with each other.

Well what if one took the mappings that people create in tools like the Microsoft Transliteration Utility (discussed previously), and used them as the basis for a transliterating keyboard or input method?

This is an idea that has a lot of potential, if you ask me.

It is not even all that new, if you remember Ideas about loading existing keyboards in MSKLC from last year which gave the three most common complaints from people trying to work with other language keyboards:

  • The movement of some letters within a script;
  • The movement of punctuation always;
  • The lack of phonetic relationship between scripts.
  • Note that the problem here in this post is all about that third bullet point.

    There is a really good model for functionality that could be considered for a future version of MSKLC (as luck would have it, some of the new owners of the tool skim the posts here from time to time!).

     

    This post brought to you by ƾ (U+01be, a.k.a. LATIN LETTER INVERTED GLOTTAL STOP WITH STROKE)


    Marc Durdin on 13 May 2007 5:59 PM:

    Michael,

    Quite interesting the way your thoughts are going here.  This is exactly where Tavultesoft Keyman started (in 1992!) - the Keyman keyboard language came out of SIL's CC utility, which was widely used for transliteration and character mapping.  One of the objectives of Keyman was to provide a relatively straightforward translation from a CC table to a keyboard layout.

    That is one reason why Keyman works so well for 'phonetic' keyboard layouts.

    This issue rapidly gets complex - for instance, Thai has 6 letters that are regularly transliterated to the English 'k' or 'kh'.  Even ignoring the distinctions between the 6 characters, as I understand it, MSKLC/Windows keyboard layouts would not be handle 'k' vs 'kh' as there is no ability to handle context or sequences (deadkeys wouldn't work here).  Are there plans to extend the Windows keyboard layouts to handle this type of situation?

    Michael S. Kaplan on 14 May 2007 1:00 AM:

    There are no plans to do so that I am aware of -- but TSF and especially the text-based TIPs can work well here....


    go to newer or older post, or back to index or month or day