Getting out of the compatibility zone, redux

by Michael S. Kaplan, published on 2006/01/14 20:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2006/01/14/512985.aspx


Peter from Belgium sent me the following question:

Hi Kaplan,

I have a problem. I have an array with unicode characters. These characters are the standalone characters from Maroc. I'm looking for a function where i can put in the standalone characters array and i should receive the combined or replaced chars. These are dependent from the place where the char stands in a word. I read about uniscribe and complex script, but it doens't look that easy to use such API calls. Is the thing i want also possible with Visual Basic.NET? I 'm thinking at the StringInfo class in the namespace globalization. Does your book provide an example for my problem in VB.NET? If it does, where can i order it?

Thank you very much,

Peter

I am not sure I fully understood the question, but it appears that Peter might be talking about the compatibility forms of the Arabic script and wanting to convert them to the regular Arabic scipt

You cannot get to that information from either the StringInfo class or from Uniscribe, though....

This is the subject of the posts It Does Not Always Pay to be Compatible and Getting out of dodge (or at least out of the compatibility range!).

Assuming that is Peter's question, the answer (as I point out in that second post) is to use Unicode normalization. You can find support for in Whidbey, in Vista, and in the downlevel package for IDN. The best bet is nomalization form KC for getting rid of those compatibility forms and receiving 'combined' characters. You can use any of thee three methods from VB.NET, depending on what version of the .NET Framework and what version of the operating system you are running on....

Now I can't claim that my book describes how to do this using any of these technologies, since the book was released in September of 2000, before even 1.0 of the .NET Framework had been released and several versions before normalization had made it into Microsoft products (heck, the book came out before v.1 of VB.NET was even released!).

But the syntax for calling it from the 2.0 version of the .NET Framework can be seen right in the String.Normalize(NormalizationForm) topic on MSDN; just be sure to use NormlizationForm.FormC and that should do the trick.

Unless of course this was not Peter's question at all, in which case I have just been blathering on here on a Saturday afternoon.... :-)

 

This post brought to you by "" (U+fee0, a.k.a. ARABIC LETTER LAM MEDIAL FORM)

 


no comments

referenced by

2009/02/04 The road to hell is paved with attempts at being compatible

2008/09/04 Staying away from the compatibility zone is still a good idea

2008/05/04 Who bells the cat when it comes to glyph substitution?

go to newer or older post, or back to index or month or day