Arabic ≠ Hebrew, and Hebrew ≠ Arabic

by Michael S. Kaplan, published on 2010/02/02 07:01 -05:00, original URI: http://blogs.msdn.com/b/michkap/archive/2010/02/02/9956547.aspx


like an obvious point in many areas - from the linguistic to the political to the historical to the technical.

Today I am going to limit myself to [some of] the technical....

In many (most?) cases, software goes through three distinct phases when dealing with bidirectional languages:

PHASE 1: "They write everything how???"

In this phase, Hebrew is right in a few places and a few things kind of work by accident but by and large everything is the wrong way in text, even if the product is fully localized. It just looks wrong wrong wrong wrong wrong.

PHASE 2: "Ok, we flipped everything. What's next?"

In this phase, Arabic is mostly right though still broken in a few places due to over-eager mirroring. But Hebrew has all of the things that used to work broken now. Oops!

PHASE 3: "I think we are pretty much right now."

In this phase, the over-eagerness of PHASE 2 that even broke the things that were right way back in PHASE 1 is properly reined in and the differences between Arabic and Hebrew are better understood.

Now those differences are interesting, so I thought I'd talk about them.

In the average release of a not-small product, like maybe say Windows, there are many little things that are different. But an astonishing number of them fall into six simple categories, which are listed in the below table of contrasts:

 

Hebrew

Arabic

Question Mark

not mirrored

mirrored

Date

Short date for Hebrew is LTR.
E.g.. 14/01/2010 18:30 PM

Short date for Arabic is RTL.
E.g. PM 18:30 2010/01/14

Number

"Arabic" numbers

"Hindi / Arabic" numbers

Percentage

Appears to the Right of the number same as English.
E.g.
(20%)

Appears to the Left of the number.
E.g.
(%20)

Units Label

Appear to the Right of the number same as English.
E.g. [30] [kilobytes]

Appear to the Left of the number.
E.g. [ Kilobyte] [30]

Math formula

Non-mirrored  Math formulas.

mirrored Math formulas.

This strikes me as a very useful table!

I mean, had this information been made available earlier, PHASE 2 might have been skipped entirely just through the general awareness of the issues....

Anyway I thought I'd put it up here here (special thanks to Amr for putting it together and Gilead for forwarding it to me (those two gentlemen are also technically a case of Arabic ≠ Hebrew, and Hebrew ≠ Arabic but they work quite well together and as far as I know that has been true since PHASE 1, even though one of them is all the way over in Egypt!).

Now some might add a fourth phase on the three I gave before:

PHASE 4: "What do you mean there are other languages to worry about?"

In this phase, the fact of Farsi (Persian), Uyghur (Uighur), Pashto, Urdu, Sindhi, and so on come into play, with some small additional differences in a few places that are much less well understood even by experts in the field. But the general principles above are still valid and interesting to consider, as are the principal six types of differences.

 


# Abdusalam on 3 Feb 2010 8:02 AM:

Hi Michael,

Long time no see.

The PHASE 4 really concerns me.  After reading through the table, I've finally come to understand that besides the major ones, there ARE some other differences between Arabic and Uyghur that I would like to share with you.

I'm assuming that the square brackets in the table mean the enclosed content is supposed to be written in the script of the corresponding language. So for Uyghur:

Question Mark: Mirrored (؟)

Date: RTL (e.g., [PM] 06:30 2010/01/14)

Number: "Arabic" (1, 2, 3, ...)

Percentage: Appears to the Right of the number as same as English and Hebrew (e.g., 20%). I'm pretty sure of this.

Units Label: Appears to the Left of the number as same as Arabic (e.g., [kilobyte] 30); otherwise, Latin-form units (e.g., kg, km, kb/s) should always appear to the Right of the Arabic numbers with no space between them (e.g., 30kb).

Math Formula: Not mirrored as same as English in terms of Arabic numbers.

So we could see that, in Uyghur, in terms of Latin unit labels (e.g., kg, kb/s) and signs (%), they should always keep their positions as in English (30kb, 20kb/s, and 20%).

At last, I really would like to know how you handle the issues arisen in the PHASE 4, especially, when the issues show up on th UI like they do in Windows 7 currently.

I'm no expert but being a native speaker of Uyghur, my two cents worth. =)

# Michael S. Kaplan on 3 Feb 2010 8:07 AM:

Well, I will be talking about some of the issues involved with PHASE 4 eventually, though I will likely focus on UI languages within Windows; to date, that means only Persian (Farsi), Urdu, and Pashto -- Uyghur (Uyghr or Uighur) is not currently a UI language on any version of Windows ever shipped....

# Abdusalam on 3 Feb 2010 8:25 AM:

Understandable...

However, that data might be helpful someday.  Anyway, I'm looking forward to your posts about the "PHASE 4".

# Michael S. Kaplan on 3 Feb 2010 9:27 AM:

Oh, no argument there! :)

# Ambarish Sridharanarayanan on 5 Feb 2010 11:18 AM:

Don't you mean Hindu/Arabic numbers?

# Michael S. Kaplan on 5 Feb 2010 3:55 PM:

Most people who are taslking about the digits used with the Arabic script call them the Indic numbers.

Then they call the ones used with English "Arabic/Indic" digits.

Hindi itself used the Devanagari digits, which is kind of ironic. :-)

taherleila on 3 Oct 2013 1:53 AM:

arbic lan.

Michael S. Kaplan on 3 Oct 2013 6:13 PM:

Huh?


referenced by

2010/08/19 A bit about some Arabic script but not Arabic language stuff...

go to newer or older post, or back to index or month or day