Not so Lao[d], at least not until Vista

by Michael S. Kaplan, published on 2007/07/02 10:44 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2007/07/02/3660189.aspx


The other day I got a mail via the contact link:

Mr. Kaplan,

My name is Anousak Souphavanh, a Lao software developer with the Science and Technology Department. I am recently trying to develop Lao database using MS SQL but understood that Lao is missing from the MS SQL support list of languages. I really like to add Lao UNICODE but needs your help and support. I read your presentation titled 'Unicode and Collation Support in Microsoft SQL Server' which held in Prague on 23-26 March, 2003. There seems to a doable for Lao language but I need to understand what actually that I need to implement. Please guide me in the right direction, ie. docs, urls to docs and info, and etc.

I appreciated in advance for your help.

Well, the proper locale and collation support for Lao was not done until Vista, and has not yet been done in SQL Server. But both pre-Vista Windows and every version of SQL Server that supports Unicode (7.0 to 9.0) gave some weight to Lao so the sort won't be right but it will at least be something.

Now another option one could consider is a binary collation, which will also assure that characters are findable.

The third option -- if one does have Vista and does not want to wait for an unknown future version of SQL Server, the solution given in the Extending collation support in SQL Server and Jet series can be used to generate sort key values, and then you can get the right collation behavior, too!

Finally, there is no font that covers Lao really well on the Windows platform itself until Vista (the font provided in Vista for Lao is DokChampa) but hopefully Anousak already has a font handy.... :-)

 

This post brought to you by (U+0ea5, a.k.a. LAO LETTER LO LOOT)


John M. Durdin on 3 Jul 2007 5:02 AM:

Sorting Lao according to the standard Lao dictionary order is phonetic by syllables rather than strictly orthographic, which makes it very different from sorting Thai.  The process is: (1) identify syllables, (2) create a sorting key that evaluates syllables, with a priority of initial consonant(s), final consonant, vowel, tone.  (A vowel can be several Unicode characters ordered before or after the initial consonant, and displayed anywhere before, after, above and/or below it).

There are many good reasons for this approach being adopted for Lao dictionary sorting (in the 1960s, not recently).  

This sorting has been implemented in Lao Script for Windows for many years, and can easily be implemented for SQL server if calls to external DLLs are used to break each string into syllables.   (Of course, exceptions such as loan words must be handled, too.)

Michael S. Kaplan on 4 Jul 2007 5:35 PM:

The algorithm used in Vista is reportedly not too bad (though it is not dictionary based and therefore will not handle loan words).

John Durdin on 5 Jul 2007 5:30 AM:

What does "not too bad" mean?  If they have implemented sorting according to the Thai system (which is what I am afraid of, as it is much simpler to write an algorithm for), it will be completely wrong according to accepted Lao usage.  

BTW your statement  "no (Lao) font until Vista" is not correct - Saysettha OT is a Unicode font that has been around and widely used for several years (e.g. VOA's Lao pages).  The OT tables in that font were part of the information provided to MS for their development of Dok Champa.

Michael S. Kaplan on 5 Jul 2007 8:41 AM:

It is not at all based on Thai. I meant fonts shipping within Windows....


referenced by

2008/02/23 Despite progression, the bug calls out to me quite LAOdly

go to newer or older post, or back to index or month or day