Staying away from the compatibility zone is still a good idea

by Michael S. Kaplan, published on 2008/09/04 03:21 -04:00, original URI:

Martin asks:

Hello Mr. Michael!

I've a problem with Arabic (FARSI) unicode. I searched all the web many days - without success. I also posted to a newsgroup ( and there they couldn't help me by answering my question, but the linked me to you. So I was visiting your side, and I think .. They are right! "If anybody can help you, then M. Kaplan!" So I hope you can, and you feel like helping me - of course ok if not.

My problem: I've an access db (mdb). There is one field with arabic names. When I fetch the data, I get this in 'normal" unicode (U+0600 – U+06FF) but I need the presentation forms unicode (U+FB50 – U+FDFF andor U+FE70 – U+FEFF) depending on the position where it is (init / medial / final / isolated). Hope you understand what I mean - my english isn't the best, also not my articulation.. It's possible to get it?

Or maybe you can tell me how e.g. notepad is handle this. I mean, when I write arabic text in it, and I save the file, open binary then I can see that every char is in the 'normal' unicode. But it is displaying the string correct, all the chars in correct contextual form. How does it know which form it has to display?! Or is it calculating this by itself? (something like "if the last char = middle, then ......"?

hmm .. Very complicated ..

OK, I don't want to spam you with so much text. Maybe you can have a look at this and maybe you can give just a short hint or something like this .. Would be very great!

Thanks A LOT (!!) in advance!
With Regards,

This question is one that I have talked about various aspects of before, in blogs like

The simple fact is that the model used in Unicode is to not force a person to have up to four different ways to display every letter, and requiring them to choose the letter to use each time.

Because like I point out in that very first blog, this is not such a great way to do things.

Now as to how this model works, well for that one might want to dig into OpenType, specifically Developing OpenType Fonts for Arabic Script.

Going the other way (into the conpat. zone) isn't a good idea, and not just because there is no good way to do it -- but also because they will find many bugs since Arabi will often have even more forms than those four in order to havae appropriate connections happen between certain letters -- the kind of thing that the compatibility zone approac just can't handle....


This blog brought to you by īš° (U+fe70, aka ARABIC FATHATAN ISOLATED FORM)

no comments

go to newer or older post, or back to index or month or day