by Michael S. Kaplan, published on 2005/12/02 10:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/12/02/498889.aspx
The other day, someone in another group in Microsoft asked me:
As part of my implementation, I need to map from FExx or FBxx range of Arabic characters back to the base in 0x06xx range. There is a Unicode database text file that I can use (http://www.unicode.org/Public/UNIDATA/UnicodeData.txt), but I was wondering whether any libraries that we ship with Windows would have API’s to do this task, or would you recommend some other way. Thanks for your time.
Indeed, the mapping from the compatibility range into the regular Arabic block is one that is defined in Unicode.
(Regular readers may recall that I talked about this back when I discussed how It Does Not Always Pay to be Compatible).
But you do not have to grab down the file from unicode.org; that definition is also found now in Microsoft platforms -- in the normalization functionality found in Whidbey, in Vista, and in the downlevel package for IDN.
If you map to Unicode Normalization Form KC, you will move text directly out of the compatibility form....
Easy, right? :-)
That download is really cool, by the way -- a lot of functionality in there!
This post brought to you by "ﺸ" (U+feb8, a.k.a. ARABIC LETTER SHEEN MEDIAL FORM)
referenced by
2010/09/16 Providing more information is the best way to assure correct information is received
2009/02/04 The road to hell is paved with attempts at being compatible
2008/09/04 Staying away from the compatibility zone is still a good idea
2008/05/04 Who bells the cat when it comes to glyph substitution?
2006/01/14 Getting out of the compatibility zone, redux