by Michael S. Kaplan, published on 2006/08/19 06:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/08/19/707013.aspx
Regular reader KJK:Hyperion asked in the Suggestion Box:
...when will Transliteration Utility support Romaji and Hiragana transliteration for Japanese? That's basically the only one I need. At the moment I use http://www.j-talk.com/nihongo/ but I'd prefer an off-line tool.
The tool that he is referring to is the Microsoft Transliteration Utility v1.0, which Thierry Fontenelle talks about in English here (and in French here).
I happened to be in an email thread with one of the authors of the tool (Nick Cipollone) and Thierry, and I figured I'd ask them this question. :-)
And Nick gave me the scoop:
Our basic strategy with Transliteration Utility was just to get the thing out the door with a few representative types of modules that people could use as models to create their own. The only modules that were specifically requested by anyone were the Inuktitut Syllabary <-> Romanization modules (requested by the Canadian sub), the rest were basically things we had lying around.
We had intended to put out “module expansion packs” every now and then, once we had enough new modules to justify it. We haven’t developed any new ones for public consumption since Transliteration Utility shipped in January, though. We also hoped as a stretch goal that individuals or companies other than Microsoft might eventually provide module expansion packs, although this hasn’t happened to our knowledge yet either.
Well, that sounds like a call to arms for me, what do all of you think? :-)
The tool itself is a pretty cool thing, and it may be worth looking into building a new transliteration model in its Module Development Console:
The text in the Module Development Console lays out what is involved, and it looks pretty straightforward (all you would need is good knowledge of the languages and the transliteration in question to fill it in!):
[Input]
[Output]
// Insert a several-word description of the module's input.
// For example:
// Romanization
// Insert a several-word description of the module's output.
// For example:
// Cyrillic
[Description]
// Give a several-sentence description of the module.
[Preprocess]
// If you need to preprocess your input before applying
// rules specify the procedure here.
// For example:
// ToLower
// ToUpper(tr-TR)
[States]
// If you need any states other than the two predefined ones
// (START and DEFAULT) then declare their names here.
// For example:
// CONSONANT
// VOWEL
[FollowingContextMacros]
// Insert any following context macro definitions here.
// For example:
// Cons b c d f g h j k l m n p q r s t v w x y z
// ConsOrEnd <END> :Cons:
// Vowel a e i o u
// VowelAtEnd a<END> e<END> i<END> o<END> u<END>
[EscapeSpanDelimiters]
// If you need to be able to prevent spans of the input
// from being processed you can specify one pair of strings
// to indicate the beginning and end of such escaped spans.
// For example:
// { }
// /* */
[Rules]
// List your rules here. For example:
// a --> x
// a(<END>) --> y
// [START] fa --> z [VOWEL]
Anyone want to give it a shot? :-)
This post brought to you by ぱ (U+3071, a.k.a. HIRAGANA LETTER PA)
# Nektar on 19 Aug 2006 1:23 PM:
# Michael S. Kaplan on 19 Aug 2006 1:40 PM:
# dennispg on 20 Aug 2006 7:52 AM:
# Michael S. Kaplan on 20 Aug 2006 9:45 AM:
# Nick Cipollone on 20 Aug 2006 2:04 PM:
# Michael S. Kaplan on 20 Aug 2006 9:02 PM:
# Patrick Hall on 22 Aug 2006 2:37 PM:
# Michael S. Kaplan on 22 Aug 2006 2:46 PM:
# Jonathan T. Capes on 23 Oct 2006 2:53 PM:
I happened upon the transliteration tool and after finding Nick's post above, I made my own module to convert Uzbek Cyrillic to the Roman alphabet Uzbekistan adopted in 1995, which should have become the official standard in 2005.
I haven't yet looked at creating a module for Roman --> Cyrillic. I know that there would be some fairly major issues going in that direction as Cyrillic --> Roman was not a lossless process.
I would be more than happy to provide the module to anyone who could use it. I also created a much more friendly Uzbek keyboard layout, using QWERTY as a basis, making it much more intuitive for QWERTY users to input Uzbek in Cyrillic.
I'll check back here or you can email me at
capes at u dot washington dot edu
referenced by