Every character has a story #9: U+06a4 (ARABIC LETTER VEH)

by Michael S. Kaplan, published on 2005/05/09 02:01 -04:00, original URI: http://blogs.msdn.com/b/michkap/archive/2005/05/09/415613.aspx

Patrick Andries started the thread off back in July of 2002:

06A4 ARABIC LETTER VEH

* Arabic for foreign words

Is it true that this letter is sometimes used in Algeria to represent G (as in guide, gone) in Proper Nouns (of Berber origin). Source : Yannis Haralambous.

If it is true, shouldn't it be specified in the annotation, since the name seems to indicate it is only used to represent V in loan words (vitamins, instead of fitamins) ?

Thomas Milo quickly agreed and gave some more info:

Yes.

Arabic letter Qaf in the Arabic Badwi (rural) vernacular evolved from /q/ to /g/. Therefore the archigrapheme FEH/QAF (the common elements minus the distinctive features) is enhanced with a third dot to express this change (QAF > Badwi GAF) while maintaining the etymological relation with the Qaf.

A good example is the personal name <qadhdhaafii> that is pronounced /gadhdhaafi/ because both in Algeria and Libya the prestige dialects are the Badwi (Bedouin) ones.

In area's where the prestige dialect is a town vernacular, /q/ evolved to /'/ (Hamz) and /j/ (Jeem) - sometimes - to /g/. The communities associate the voiced guttural /g/ with the jeem and express that by adding dots to the JEEM/HAH/KHAH archigrapheme. This leaves the FEH/QAF archigrapheme for expressing the VEH which is associated with the Feh as nearest match.

...

"In area's where the prestige dialect is a town vernacular, /q/ evolved to /'/ (Hamz) and /j/ (Jeem) - sometimes - to /g/. The communities associate the voiced guttural /g/ with the jeem and express that by adding dots to the JEEM/HAH/KHAH archigrapheme. This leaves the FEH/QAF archigrapheme for expressing the VEH which is associated with the Feh as nearest match."

I observed in Morocco the spelling of /g/ with archigraphemic JEEM/HAH/KHAH and three dots below ( Unicode CHEH) for the word garage (garaazh): CHEH-REH-ALEF-JEEM

Jonathan Kew disgreed with much of Patrick's original post, though:

I don't know whether this is accurate. FWIW, I am aware of U+06A8 being used for /g/ in Berber languages; this could easily be confused with U+06A4, as the initial and medial forms would be identical.

And further, he disagreed with the hurry-up annotation request:

Not necessarily. There are many other parallel cases; before starting down this road, I think we'd need to figure out some criteria for determining what gets into the annotations.

Patrick was quick to repond to this note about the annotation:

Perhaps when the name is not sufficient to identify the character referred to ?

The complaint some typographers have runs something like this : the reference glyph is not sufficient to determine the character to be used (by design many glyph variations are not represented and many glyphs look similar BULLET, Z NOTATION SPOT and BLACK CIRCLE), the name may be said to be the best clue (we know what a bullet is used for and its meaning and character are semantic beasts), but what are we to do when the name does not allow us to determine whether this is the right character (either because
this is not the name under which I know it or because its name only describes the glyph shape (IPA for instance) ? I believe this is where annotations are useful. Admittedly, in day to day identification of characters, their location (block name) and proximity (being part of an known alphabet) also helps.

Kamal Mansour pointed out that others had found different uses for the character, though:

The VEH (06a4) has multiple uses which vary from region to region.

1. In N Africa, it is used to represent the [q] turned [g]

2. In Egypt & some other Arabic countries it used in spelling "vitamin" and "Volvo"

3. In Jawi, it is used to represent [p]

Joe Becker (sometimes called the "father of Unicode"), provided some useful context about annotations:

as you can imagine the annotations, much less the name, cannot cover every usage (remember "ghoti" for "fish" in English). Because usage boundaries are ill-defined, to annotate or not is a judgement call by the editorial committee.

I had an opposite experience with this character long ago. We put it on the keyboard of our Arabic document system, and when we delivered it the woman representing the customer pointed to it and exclaimed angrily that there is no such character in Arabic. I pointed out that we had samples of it in print, and she repeated that there is no such character. Finally I had to suggest that if there was no such character, then she certainly was not required to type it!

Patrick had an interesting response to Joe' first paragraph, about the Editorial Committee:

Obviously, and we trust their wisdom (this usage for /g/ seems to be less marginal and jocular than other usages).

Roozbeh Pournader then had several posts, first in response to Patrick's original message:

No, the letter used to represent /g/ in North Africa is U+06A8 ARABIC LETTER QAF WITH THREE DOTS ABOVE. I could see that in my recent trip to Tunisia, in transcription of "Samsung". I also talked with some local experts about the transcriptions. To avoid confusions in the medial and initial forms, they use U+06A5 ARABIC LETTER FEH WITH THREE DOTS BELOW to represent /v/, which we could see on "Viva" advertisements.

The confusion from Yannis's side should be because of the complete similiarity in the medial and initial forms. Other similiar pairs, like Keheh and Kaf, or Farsi Yeh and Yeh have created this confusion many times.

And then to Patrick's later suggetion that we trust their wisdom:

Actually, we should trust their wisdom more here. Using of VEH for North African /g/ is just plain wrong.

Then finally in response to Joe Becker's post:

I came to a very similar problem in a higher level. In the latest Arabic Internet Names Consortium meeting, when they were discussing the list of the characters to allow in Arabic (language) domain names. There is a certain need to limit the characters in each TLD I believe, to avoid security risks caused by pairs of letters that look exactly the same in all fonts and sizes (like medial Keheh vs medial Kaf).

They were thinking in the ISO 8859-6 frame at the beginning, and have forgotten to include VEH. I reminded them of the letter, and that many foreign brands may like to register domain names with the letter. Some experts agreed with me at the beginning, but it was voted down when people started arguing about keeping the purity of the Arabic language and that there is no such letter in Koran. They told that we should ask foreign companies to register their names with *pure* Arabic characters.

The chair of their linguistic committee, a guy who had philosphical problems with a Latin email address, insisted that he doesn't even have VEH on his keyboard. (No, don't take me wrong, he had a recent PhD in Computer Science from Canada, and was one of only three technical guys behind '.sa'.)

It was only two days later that I found that the letter will create a security risk because of the confusion with QAF WITH THREE DOTS ABOVE, in heavy use in North Africa, and should thus be forbidden.

But what I learned was that many Arabs just hate this letter.

Eventually (after all of the proceeding posts throughtout the day), Ken Whitler chimed in:

My takeaway from this thread is that this is a complicated, touchy subject which even the experts disagree about.

I think the editorial committee could provide further annotation for this character in the names list, *if* the experts discussing it could come to agreement on a single line of text that would help out.

But *monographs* are not in-scope for annotations of characters in the names list. The collected text so far today on this character could easily be turned into a short Unicode Technical Note on U+06A4. Perhaps that is the better route to clarify such difficult issues of
Arabic typographical usage.

Jonathan Kew then had some parting thoughts on the issue of annotations here:

While I sympathise with the complaint, and certainly agree that the annotations can be helpful, I think that:

(a) attempting to extend the annotations in a piecemeal fashion, whenever a particular character catches someone's attention, will lead to an editorial nightmare and will not help achieve any sort of consistency;

(b) attempting a thorough "upgrade" of the annotations to document the range of established uses of each character, or to more "completely" describe the identity of "doubtful" characters, while it would be extremely useful, would be a vast undertaking.

So although I would not be opposed to annotations of this sort, suggesting the addition of this one by itself does seem to invite the opening of a rather large can of worms. Are we prepared to deal with a deluge of such reports?

James E. Agenbroad really disagreed with Jonathan here, though;

I must beg to differ. The present annotations show that useful ones can be written without any explicit "philosophy of annotation" as a foundation. If the desirability of an additional annotation comes to the Unicode Consortium's attention *and* there is consensus on what it should say (not always easy to achieve, I admit) why not add it? I view annotations not as definitive rules that must be followed but as a voice of experience to help those less familiar with some peculiarity of a particular character. If a deluge does happen it might suggest augmenting the description of procesisng for some alphabet.

On the other hand, Jim Caldwell really appreciated Jonathan' post and had some suggestions:

You make an excellent point. You have pulled together the comments of thers into a nice argument that actually puts the question of the use of ARABIC LETTER VEH into proper context.

Character annotations beyond the basic properties should not be decided on a case-by-case basis at the Unicode editorial committee or the technical committee. Our goal is to keep the standard itself as simple as possible, yet ambiguities and misunderstandings will invariably arise because usage varies by language and locale.

I suggest that the basic annotations be kept simple and that a supplementary volume of annotations be published. That task is so huge, as you and other colleagues have pointed out, that an appropriate approach would be to solicit the various standards bodies and interested academic talents to take on sections of the standard and make recommendations to an editorial sub-committee which will publish the supplementary volume of annotations.

This sub-committee should be tasked with building consensus and defusing potential political controversies. Of course, some groups will "hate" the use another group makes of a given glyph. However, our goal is to build a shared standard that (ideally) will meet the needs of all users of characters in the standard. Therefore, this subcommittee might require that annotations include guidance, to help typographers, writers and software developers, determine when a given glyph may be appropriately used in one locale but not in another.

This supplementary volume could be especially valuable to writers of software with built-in locale-switching functionality.

Ken Whistler sided with James Agenbroad, though:

I have to agree with Jim Agenbroad here.

By the way, the large can of worms has been open for years, already, and I already deal with the (smallish) deluge of reports, in consultation with the rest of the editorial committee, as required.

Most such reports never make it to the unicore list for discussion, since they are generally self-evident small improvements or corrections.

And Ken disagreed with Jim Caldwell:

> Character annotations beyond the basic properties should not be decided on  a case-by-case basis at the Unicode editorial committee or the technical committee.

I have to disagree there. Character annotations are a quintessentially editorial task, properly in the domain of the Unicode editorial committee.

Only when a potential annotational improvement, such as the issue of U+06A4, raises a question regarding character identity or other potential issues of encoding correctness, does this rise to the level of UTC concern and debate.

> Our goal is to keep the standard itself as simple as possible,

And I have to disagree with that, too. Certainly our goal is not to needlessly complexify or obfuscate the standard, but simplicity, per se, has never really been a goal of Unicode. It is an inherently complex task to document a *universal* character encoding. And the standard
itself is also inherently complex, because of the many requirements it must meet.

One of the requirements that the book publication of the standard must meet is clarification of character identity in the standard, and annotations are one of the -- mostly successful -- means the editors have in their hands to accomplish this.

> yet ambiguities and misunderstandings will invariably arise because usage varies by language and locale. I suggest that the basic annotations be kept simple

No one is suggesting that character-by-character annotations in the names list will be turned, willy-nilly, into arbitrary essays.

I pushed back on this annotation precisely because it wasn't clear to me what single line annotation could markedly improve the identification of this character in the existing names list, given the discussion which it engendered. Instead, U+06A4 deserves its entry in the
Every Character Has a Story Hall of Fame.

> and that a supplementary volume of annotations be published. That task is so huge, as  you and other colleagues have pointed out, that an appropriate approach would be to solicit the various standards bodies and interested academic > talents to take on sections of the standard and make recommendations to an  editorial sub-committee

Who?

> which will publish

How?

> the supplementary volume of  annotations.

When?

> This sub-committee should be tasked with building consensus and defusing  potential political controversies.

No, that is the job of the Unicode Technical Committee in establishing and maintaining the standard.

I don't really see a role for an annotational meta-committee here.

What you are suggesting, could, of course, be undertaken by some interested group of character historians and/or a group of concerned implementers with lots of time on their hands.

> Of course, some groups will "hate" the use another group makes of a given glyph. However, our goal is to build a shared standard that (ideally) will meet the needs of all users of characters in the standard. Therefore, this subcommittee might require that annotations include guidance, to help typographers, writers and software developers, determine when a given glyph may be appropriately used in one locale but not in another.

I would suggest that this could be more appropriately and efficiently addressed by script implementation experts and/or typographers setting about the writing of definitive monographs on script implementations, replete with details about local variations in world scripts such as
Arabic, and with appendices that detail mappings to Unicode characters and highlighting any potential ambiguities or overlappings.

Such projects would a) be more appropriately focussed on the problems of scripts and their rendering, b) would be scaled so that the issues could be addressed by a suitable expert or group of experts in a finite amount of time, and c) would likely be more useful to implementers than a massive annotate-all-the-characters project.

Finally, Mark Davis weighed in:

The appropriate place for more detailed discussion of particular characters is in the block descriptions for the book. There, the editorial committee has to use its best judgment as to what to include, based upon the text supplied to it for possible inclusion. A primary goal is to supply enough information to supply enough information that the correct character can be chosen when the choice would otherwise not be clear. Additional guidelines for usage and background information for the script may be included at the committee's discretion.

And Jame Caldwell reponded to Mark's message:

That seems very reasonable to me.

My primary concern was that too much political argument might bog down the work of the committee. As long as we can be clear that users can see how their locale is supported and that we are objectively serving many locales as well as professional fields such as math and linguistics.

If it gets into some very complex and detailed annotation of characters, with historical explanations, then perhaps it should be a separate project for those who see it as important.

This post brought to you by "ڤ" (U+06a4, ARABIC LETTER VEH)

# ronab49 on 9 May 2005 3:22 AM:

Interesting. Do you know the story of bidi and the timeline of its adoption by Unicode and Windows. That would be another informative post as well.

# Maurits [MSFT] on 9 May 2005 4:51 AM:

I'm reminded of an Einstein quote:
Keep things as simple as possible - but no simpler

# Michael S. Kaplan on 9 May 2005 7:11 AM:

You should put the bidi idea in the SUGGESTION BOX!!!! :-)

Please consider a donation to keep this archive running, maintained and free of advertising.
Donate €20 or more to receive an offline copy of the whole archive including all images.

go to newer or older post, or back to index or month or day